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ABSTRACT 


We  examine  two  categories  of  solution  algorithms  for  the 
large-scale  multicommodity  transshipment  problem  (MCTP) : 
resource  direction  and  price  direction.  In  the  former 
category  we  construct  RDLB,  a  new  algorithm  which  uses  a 
simplified  projection  method  in  the  subgradient  capacity 
reallocations  and  conjugate  subgradient  directions  with 
approximate  line  search  to  provide  better  termination 
conditions  in  the  Lagrangean  lower-bounding  iteration.  In 
the  latter  category,  we  develop  DDC,  a  dual  decomposition, 
and  we  introduce  RSD(P)  and  RSD(A) ,  new  non-linear 
decomposition  algorithms  for  the  MCTP  based  on  penalty 
transformations  of  the  original  problem  and  using  restricted 
simplicial  decomposition. 

Computational  results  are  presented  for  four-  and  ten- 
product  versions  of  a  problem  with  an  underlying  network  of 
3,300  nodes  and  10,400  arcs.  Results  show  RDLB  stalls 
before  reaching  optimality,  apparently  a  common  problem  in 
primal  subgradient  reallocations,  while  the  RSD  algorithms 
reach  near-optimal  solutions  up  to  10  times  faster  than  a 
direct  primal  simplex-based  solver,  and  display  very 
favorable  convergence  rates  compared  to  DDC.  As  a  final 
test,  RSD(A)  and  DDC  are  applied  to  a  100-product  problem 
totaling  330,000  nodes  and  1,040,000  arcs.   RSD(A)  reaches 


an  acceptable  solution  within  4%  of  optimality  in  under  17 
minutes,  while  DDC  terminates  after  68  minutes  with  a  12% 
gap  remaining  around  the  optimal  solution. 
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I.   INTRODUCTION 

Minimum-cost  single-commodity  capacitated  transshipment 
problems  are  now  solved  routinely  using  primal  network 
simplex  codes.  See,  for  instance,  (Bradley,  Brown,  and 
Graves,  1977) .  However,  when  multiple  products  flow  over 
the  same  network,  the  pure  network  structure  can  be 
confounded  by  the  presence  of  constraints  limiting  the  total 
flow  of  all  products  on  each  arc.  Thus,  single-commodity 
solvers  may  not  be  applied  directly,  and,  as  the  number  of 
products  increases,  the  size  of  the  constraint  matrix  grows 
so  rapidly  that  conventional  simplex  solvers  become  useless. 

Because  it  is  a  frequently  encountered  problem, 
specialized  algorithms  for  the  multicommodity  transshipment 
problem  (MCTP)  have  been  widely  studied  and  reported  in  the 
literature.  However,  none  of  the  methods  has  been  so 
generally  successful  that  it  dominates  other  methods,  as 
primal  network  algorithms  do  in  the  single-commodity  arena. 

This  paper  documents  new  algorithms  for  the  MCTP  which 
fall  into  the  broad  categories  of  "resource-directive"  and 
"price-directive."  The  first  algorithm  is  an  enhancement  of 
a  popular  resource-directive  approach  using  subgradient 
optimization,  but  incorporating  a  simplified  primal 
projection  together  with  conjugate  subgradient  directions  in 
the  Lagrangean  lower  bound  steps.   The  algorithm  promises 
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ease  of  computation  in  the  primal  restriction  and  improved 
criteria  for  termination  over  previous  algorithms,  but 
apparently  shares  a  common  problem  of  subgradient-based 
resource-direction:  stalling  in  the  primal  before  reaching 
optimal ity. 

The  set  of  price-directive  algorithms  we  present 
includes  a  variant  of  dual  (Dantzig-Wolfe)  decomposition  and 
a  new  family  of  decomposition  algorithms  using  a  penalty 
transformation  of  the  original  MCTP  to  create  a  non-linear 
master  problem  which  is  solved  by  restricted  simplicial 
decomposition.  This  is  a  unique  approach  to  the  MCTP  not 
previously  attempted  which  exhibits  superior  convergence 
rates  in  computational  tests  compared  to  other  solution 
methods.  The  algorithm  is  quite  general  and  therefore 
applicable  to  a  wide  variety  of  linear  programs  exhibiting 
complicating  constraints. 

We  introduce  the  MCTP  in  the  following  section,  and  then 
give  an  overview  of  major  solution  approaches  to  the  MCTP  in 
Section  B,  and  a  more  detailed  literature  review  in  Section 
C.  The  resource-directive  algorithm  is  presented  in  Chapter 
II  and  the  price-directive  algorithms  in  Chapter  III. 
Computational  results  appear  in  Chapter  IV  and  Chapter  V 
presents  conclusions  and  areas  for  future  research. 


A.   STATEMENT  OF  THE  PROBLEM 

The  general  form  of  linear  program  in  which  we  are 
interested  is: 

(LP)      min  ex  (duals) 

St   Aj^x  <  hi  i'^i) 

A2X  =  b2  (U2) 

0  <  X  <  b3 

where  one  set  of  constraints,  A2X  =  b2/  0  <.  x  <  b3  is 
deemed  to  be  easy  to  solve,  and  a  second  set,  A^x  <  h-^, 
complicates  the  problem.  The  vector  u  =  (U2^,U2)  is  the  set 
of  dual  multipliers  associated  with  the  constraints  with 
^1  <  0  and  U2  unrestricted  in  sign. 

The  following  notation  will  be  used  throughout  this 
presentation.  We  assume  all  products  flow  over  the  same 
underlying  network.  Let  |Z|  denote  the  cardinality  of  a  set 
Z. 

T  =  (I, J)  is  a  transshipment  network  with  a  set  of  nodes, 
I,  and  a  set  of  arcs,  J. 

P  is  the  set  of  products  flowing  on  T. 

i  €  I  is  a  node  of  the  network. 

j  e  J  is  an  arc  in  the  network. 

p  e  P  is  a  product  using  network  T. 

Np   is  an  (|I|  ^  |J|)  node-arc  incidence  matrix  for  each 
product  (N]^  =  ...  =  Nipi). 

N    is  an  ( I  1 1  •  I P| )  X  ( I  J  I  •  I P| )  matrix  with  the  Np  matrices 
along  the  diagonal,  O's  elsewhere. 
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c  =  (C]_,  .  .  .  ,  Cp,  .  .  .  ,  c  I  p|  }  is  a  vector  of  costs  on  the  arcs, 
length  | J] • iP| . 

X  =  (X]_,  .  .  .  ,  Xp,  .  .  .  ,5^1  pi  }  is  a  vector  of  flows  on  the  arcs, 
length  | J] * |P| • 

hi      is  a  vector  of  joint  capacities  with  length  |J| 

b2   =  right  hand  side  with 

b2pi  >  0  if  product  p  has  a  supply  at  node  i, 
b2pi  <  0  if  product  p  has  a  demand  at  node  i,  and 
^2pi  ~  ^  otherwise. 

A  is  a  |J|  X  |J|*|P|  matrix  =  {I,...,I}. 

We  specialize  LP  to  MCTP  by  letting  A2  =  N,  b3  =  h-^,    and 
dropping  the  subscript  on  A^: 


(MCTP) 


min 

ex 

(duals) 

(1.1) 

St 

Ax  <  b^^ 
NX  =  b2 

(ui) 
(U2) 

(1.2) 
(1.3) 

0  <  Xp  <  b]^ 

for 

all 

P 

e  P  . 

(1.4) 

The  easy  constraints  are  the  single-product  pure  network 
flow  constraints  (1.3,1.4),  and  the  complicating  constraints 
are  the  joint  capacitation  constraints  (1.2).   We  assume  for 
notational  simplicity  that  all  arcs  have  joint  capacities, 
although  in  practice  only  a  subset  have  such  restrictions. 
For  convenience,  let 

F  =  {x|Nx  =  b2,  0  <  Xp  <  bj^  for  all  p  e  P) 

be  the  set  of  all  feasible  single-commodity  flows  with  joint 
capacity   constraints   relaxed.    In   the   absence   of   the 
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complicating  these  constraints   (1.2),   the  MCTP  decouples 
into  a  set  of  independent  single  commodity  networks. 

The  Lagrangean  dual  of  MCTP  with  respect  to  the  joint 
capacity  constraints  is  found  by  placing  these  constraints 
in  the  objective  with  multipliers  U]^  <  0: 


(LR)      max  min  L(U]^,x)   =   ex  -  U]_(Ax-b2^) 
ui<0  x>0 

st    X  e  F 


According  to  duality  theory,  if  x*  solves  MCTP,  then  L(U]^,x) 
<  ex*   for  U]^  <  0  and  x  £  F.    Furthermore,   a  solution 
(ui**,x**)  to  LR  has  L(Ui**,x**)  =  ex*.   Thus,  we  may  use  LR 
to  generate  a  lower  bound  on  MCTP  by  fixing  U]^  <  0  and 
solving  the  following  problem: 


(LR(U3^))       min  ex  -  U]^(Ax-b]^) 

X 

st    X  e  F  , 

and  this  bound  will  be  tight  if  u^^  is  chosen  correctly. 
Solving  the  Lagrangean  dual  generally  allows  some 
constraints  to  be  violated  with  penalty  u^^CAx-b^^).  For  any 
X  e  F,  we  denote  the  set  of  violated  constraints  as 

Jv   =   (JIJ  ^  ^'    ^i^   >  ^1 i )    •  (1-5) 
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B.   OVERVIEW  OF  SOLUTION  TECHNIQUES 

Solution  methods  for  large-scale  linear  programs  may  be 
divided  into  three  broad  categories:  direct  factorization 
or  compact  inverse  methods,  indirect  resource-directive 
methods,  and  indirect  price-directive  methods.  Factoriza- 
tion approaches  exploit  specific  structure  inherent  within 
the  constraints  to  produce  a  compact  basis  representation 
with  which  the  steps  of  a  primal  or  dual  simplex  algorithm 
may  be  performed.  Direct  factorization  is  not  pursued  in 
this  study  (see,  for  instance,  Graves  and  McBride,  1976) . 

The  other  two  approaches,  resource  and  price  direction 
are  decompositions  which  divide  the  original  problem  into  a 
master  problem  and  subproblems  which  exchange  information  to 
solve  the  original  problem  indirectly  by  iteration. 

The  resource-directive  approach  solves  the  MCTP  as  a  two 
part  minimization: 

min  min  ex 
y>0  x>0 

St       Nx      =  b2 

X  -  y  <  0 

Ay  =  bi 

yp  <  hi      for  all  p  e  P  . 

The  vector  y  allocates  joint  capacities  to  the  individual 
commodities.  For  fixed  y,  the  solution  to  the  inner 
minimization  is 
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(RS (y) )  V(y)  =  min  ex 

St   Nx  =  b2 
0  <  X  <  y 

which  is  the  "subproblem"  or  "subproblems"  since  the 
individual  commodities  are  no  longer  coupled  and  may  be 
solved  independently. 

The  outer  minimization  can  now  be  written  as 

(RD)  min  V(y) 

st    Ay   =  b]^ 

yp  <  b]^   for  all  pep 
y   >  0  . 

Standard  methods  for  solving  (RD)  are  Benders 
decomposition  and  subgradient  optimization.  Benders 
decomposition  creates  a  master  problem  which  makes 
successive  tangential  approximations  to  V(y);  the  tangent 
planes  are  derived  by  solving  subproblems  (RS(y))  whose 
capacity  (resource)  allocations  are  determined  by  the  master 
problem.  More  details  of  Benders  decomposition  are  given  in 
Chapter  III  but  the  method  is  not  pursued  because  the  master 
problems  become  unwieldy. 

Subgradient  optimization  solves  (RD)  in  a  fashion  which 
is  analogous  to  a  projected  gradient  algorithm  which  could 
be  used  if  V(y)  were  dif  ferentiable.  Given  a  feasible 
allocation  y^  in  the  kth  iteration,  a  feasible  reallocation 
is  obtained  by 
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yk+1  =  yk  4.  s^d^ 

where  s^  is  a  scalar  step  length  and  dj^  is  a  feasible 
direction.  Since  V(y)  is  not  everywhere  dif ferentiable,  d'^ 
is  a  projected  subgradient  rather  than  a  projected  gradient. 
Standard  subgradient  optimization  does  not  use  a  line  search 
to  determine  s^  since  d^  is  not  guaranteed  to  be  a  descent 
direction.  However,  we  devise  an  algorithm  which  performs 
an  approximate  line  search  when  applicable  in  the  lower 
bound  routine.  Subgradients  are  deteirmined  via  solutions  of 
the  subproblems  RS(y);  the  master  problem  of  this  procedure 
is  the  reallocation  mechanism. 

We  present   price   direction   as   a   class   of   penalty 
problems, 


max  min   ex  +  P(z,Ax-b]^)  (1-6) 

z>0   X 

st         X  e  F 


where  P(*)  is  a  penalty  term  involving  those  constraints 
deemed  to  be  complicating.  The  vector  z  is  determined 
differently  for  various  forms  of  P(*),  but  in  general  is 
involved  in  constructing  an  optimal  vector  of  dual 
multipliers. 

Letting  P(U]^,  Ax-b^^)  =  -u^^CAx-b^),  we  see  that  (1-6) 
becomes  LR,  -z  e  u^  estimates  the  optimal  dual  multipliers, 
and  by  fixing  U]^  <  0,  the  inner  minimization  amounts  to 
solving  the  Lagrangean  subproblem  LR(U]^)  . 
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Solution  of  the  Lagrangean  dual,  i.e.,  (1.6)  with  linear 
penalties,  cannot  guarantee  an  optimal  solution  to  MCTP 
although  optimal  u^^*  and  x*  for  MCTP  are  in  the  set  of 
solutions  to  LR.  Nevertheless,  optimizing  LR  is  useful  for 
bounding  purposes  and  it  is  sometimes  possible  to  obtain 
good  solutions  to  MCTP  from  LR.  The  method  used  in  this 
work  to  solve  LR  is  subgradient  optimization. 

Subgradient  optimization  for  LR  simply  updates  the 
multiplier  estimates  at  each  iteration  k,  by  the  formula 

U]^^"*"^  =  min  (0,ui^  -  s^(Ax^-bi)) 

while  controlling  the  scalar  steplengths  s^,  resolving  the 
Lagrangean  to  obtain  a  new  x^"*"-^  and  seeking  feasibility  only 
indirectly  via  the  penalty  terms.  The  dual  update  mechanism 
is  the  master  problem  for  LR  while  the  subproblems  are 
LR(Ui^) . 

Standard  dual  (Dantzig-Wolfe)  decomposition  may  be 
interpreted  as  a  special  method  of  solving  (1.6)  with 
piecewise-linear  penalty  function, 

P(z,Ax-b2^)  =  -z(Ax-b3^) 

and  Zj  =  °°      if  (Ax-b]^)  j  >  0 

Zj  =  0       if  (Ax-b2^)j  <  0 

Zj    [0,°°)   if  (Ax-bi)j  =  0  . 

Z  is  determined  by  solving  the  master  linear  program  of 
Chapter  III. A,   yielding  the  duals  U]^  =  -z.    Subproblem 
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solutions  attempt  to  provide  descent  directions  for  this 
penalty  function. 

P(*)  may  also  take  standard  non-linear  forms  such  as  the 
quadratic  function,  .  5h|  |  Q(x)  |  |  ^ ,  where  z  is  replaced  here 
by  the  scalar  h,  Q(x)  =  (AjX-b]^j )  ■*"  =  max(0,  AjX-b^^j )  ,  and 
I  I  •  I  I  denotes  the  Euclidean  norm.  Penalty  function  theory 
tells  us  precisely  how  to  solve  this  problem:  solve  the 
inner  minimization  as  h  -^  °°  .  As  h  ^  °°  ,  x  ^  x*  and  hCAx-b]^)"^ 
converges  to  an  optimal  set  of  dual  multipliers. 

However,  starting  with  h  large  produces  a  highly  ill- 
conditioned  problem,  so  the  problem  is  usually  solved  for  a 
sequence  of  increasing  values  for  h,  producing  a  sequence  of 
improving,  but  infeasible  solutions.  An  augmented 
Lagrangean  penalty  function  is  investigated  for  reducing 
ill-conditioning  and  speeding  convergence. 

A  reasonable  approach  for  solving  these  nonlinear 
penalty  problems  is  some  feasible  descent  method.  For  fixed 
h,  feasible  descent  directions  are  derived  for  ex  +  P(h,Ax- 
b)  at  X  =  X  by  solving  the  linear  subproblem 

/\        ^ 
min   Vx(cx  +  P(h,  Ax-b]^)  )  x 

st  X  £  F 

to  obtain  x.  The  Frank-Wolfe  algorithm  (1956)  would  perform 
a  linear  search  in  the  direction  x-x  to  obtain  a  new  x  and 
iterate.  This  type  of  master  problem  tends  to  have  poor 
convergence  in  practice  so  we  employ  Restricted  Simplicial 
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Decomposition  (Hearn  et  al.,  1984)  which  maintains  a 
restricted  set  of  extreme  points  of  F  together  with  x. 
Instead  of  solving  a  simple  line  search  the  master  problem 
solves  a  nonlinear  program  over  the  convex  hull  of  the 
retained  points.  The  loss  of  simplicity  in  the  master 
problem  is  typically  offset  by  improved  convergence  of  the 
overall  algorithm. 

There  is  no  strong  rationale  for  replacing  a  hard  linear 
problem  with  a  seemingly  more  difficult  nonlinear  problem. 
Consequently,  this  approach  has  not  previously  been 
considered  for  large-scale  applications.  However,  we  show 
in  Chapter  III  that  this  approach  can  be  attractive. 
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C.   SURVEY  OF  RELATED  LITERATURE 

This  section  reviews  the  literature  which  has  led  to  the 
current  state  of  the  art  in  algorithms  for  the  MCTP.  We 
begin  with  a  brief  overview  of  single-commodity  network 
algorithms  because  of  their  computational  importance  in  many 
algorithms  for  the  MCTP.  We  then  mention  prior 
contributions  to  a  few  special  cases  of  the  MCTP,  and 
finally  review  literature  on  algorithms  used  to  solve 
general  MCTPs. 

Single-commodity  network  flow  problems  have  been  widely 
studied  since  the  194  0s,  for  two  primary  reasons:  they  are 
frequently  encountered  and  their  special  structure  lends 
itself  to  algorithms  which  are  more  efficient  than  general 
linear  programming  techniques.  The  constraint  matrix  of  the 
pure  network  problem  is  a  node-arc  incidence  matrix:  all 
O's  and  ±l's,  with  at  most  one  +1  and  one  -1  per  column. 
This  particular  structure  has  three  desirable  properties: 
total  unimodularity,  which  guarantees  integer  primal 
solutions  given  integer  right-hand  sides,  and  integer  dual 
solutions  given  integer  costs,  a  basis  matrix  which  may  be 
triangulated  by  permutation  of  rows  and  columns  and  thus 
simplifying  computation,  and  primal  extreme  point  solutions 
equivalent  to  spanning  trees  in  the  network.  Several 
algorithms  were  developed  in  the  1950s  and  1960s  which  made 
at  least  some  use  of  these  properties.  Primal  simplex 
methods  were  proposed  by  Dantzig  (1963)  and  Fulkerson  and 
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Dantzig  (1955).  Primal-dual  methods  became  popular  in 
practice  at  that  time,  including  Kuhn ' s  Hungarian  Method 
(1955)  and  the  out-of-kilter  algorithm  of  Ford  and  Fulkerson 
(1962)  . 

Primal-dual  methods  were  favored  throughout  the  1960s, 
but  some  works,  most  notably  the  basis-labelling  scheme  of 
Johnson  (1966),  set  the  stage  for  research  which  led  to  the 
efficient  primal  network  algorithms  in  use  today. 
Subsequent  research  focused  on  efficient  data  structures  and 
their  manipulation,  which  led  to  compact  basis 
representations  and  efficient  performance  of  the  simplex 
pivot.  Significant  research  was  done  in  the  1970s  by 
Srinivasan  and  Thompson  (1972,1973),  Glover,  Karney  and 
Klingman  (1974),  Glover,  Karney,  Klingman  and  Napier  (1974), 
Glover,  Klingman  and  Stutz  (1974),  Barr,  Glover,  and 
Klingman  (1979),  and  Bradley,  Brown  and  Graves  (1977). 
Results  presented  by  these  researchers  demonstrated  the 
primal  network  simplex  to  be  up  to  two  orders  of  magnitude 
faster,  to  require  much  less  memory  than  general  simplex 
solvers,  and  to  be  about  40%  faster  than  out-of-kilter 
codes. 

This  research  has  given  us  network  codes,  such  as  GNET 
(Bradley,  Brown  and  Graves,  1977)  ,  which  can  solve  large 
(say,  many  thousands  of  nodes  and  arcs)  single-commodity 
network  problems  very  efficiently.   This  research  is  doubly 
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important  for  the  MCTP  since  it  allows  efficient  subproblem 
solution  in  many  of  the  algorithms  developed  for  the  MCTP. 

Some  of  the  pure  network  structure  found  in  the  single- 
commodity  network  problem  carries  over  into  the  MCTP. 
Single-product  networks  do  appear  as  blocks  along  the 
diagonal  of  the  constraint  matrix,  but,  unfortunately,  the 
joint  capacity  constraints  couple  the  networks  together  and 
generally  destroy  total  unimodularity  of  the  constraint 
matrix,  admitting  fractional  solutions  and  requiring  real 
arithmetic.  It  is  appealing  to  try  to  restore  total 
unimodularity  or  product  independence  by  transformation. 
Some  success  in  this  area  has  been  reported  by  Evans 
(1976, 1978a, 1978b, 1983)  and  by  Evans,  Jarvis  and  Duke 
(1977) .  The  class  of  problems  for  which  their  methods  work 
is  quite  restricted,  however:  total  unimodularity  is  shown 
to  hold  when  the  number  of  sources  or  sinks  for  each  product 
is  less  than  or  equal  to  2,  and  the  existence  of 
transformations  to  single-product  networks  is  shown  when  the 
number  of  sinks  per  product  equals  2.  Thus,  this  is  of 
little  help  to  the  general  MCTP. 

Algorithms  for  the  general  MCTP  are  broadly  classified 
as  direct  methods  which  exploit  special  structure  using  the 
simplex  algorithm  or  indirect  methods  which  use  some  form  of 
decomposition.  The  indirect  methods  include  price-direction 
and  resource-direction. 
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We  first  review  the  development  of  price-direction. 
Recognizing  that  the  number  of  extreme  points  in  a  master 
problem  may  be  huge,  Ford  and  Fulkerson  (1958)  devised  a 
procedure  for  the  multicommodity  maximal  flow  problem  which 
uses  column  generation  to  produce  favorable  extreme  points 
as  needed.  In  their  formulation  of  the  problem,  the 
variables  correspond  to  chains  or  paths  through  the  network, 
and  the  constraints  represent  individual  arcs.  Dantzig  and 
Wolfe  (1960)  formalized  the  procedure  into  the  dual 
decomposition  procedure  in  which  the  master  problem  is 
solved  to  provide  pricing  information  to  the  subproblems. 
Solving  the  subproblems  produces  an  extreme  point,  which  is 
added  to  the  master  problem  if  it  is  favorable,  or  indicates 
optimality  of  the  current  master  problem  solution  if  it  is 
not. 

Tomlin  (1966)  first  formulated  and  implemented  dual 
decomposition  for  the  minimum  cost  MCTP,  showing  the 
equivalence  of  the  Ford-Fulkerson  algorithm  to  the  algorithm 
described  by  Dantzig  and  Wolfe.  Others  have  developed 
extensions  of  the  MCTP  using  the  dual-decomposition 
approach.  Cremeans,  Smith  and  Tyndall  (1970)  and  Wollmer 
(1972)  formulated  a  model  in  which  flow  on  some  arcs  depends 
on  the  availability  of  resources  which  are  shared  with  other 
arcs.  Weigel  and  Cremeans  (1972)  extended  the  model  to 
allow  the  flow  of  each  commodity  to  be  measured  in  distinct 
units,   joint   arc   capacities   in   common   units,   and   to 
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incorporate  node  capacity  constraints.  Swoveland  (1973) 
presented  a  generalization  of  the  MCTP  which  involved  a 
decomposition  in  which  a  single  subproblem  is  solved  in  two 
stages. 

Considered  an  effective  technique  in  early  iterations, 
dual  decomposition  has  a  reputation  for  poor  convergence, 
with  progress  toward  optimality  tailing  off  in  later 
iterations.  Ho  (1984a)  speculated  that  this  failure  to 
converge  is  due  to  numerical  error  in  computer 
implementation.  Ho  and  Loute  (1983)  compared  solution  times 
of  several  problems  for  a  commercial  linear  programming 
package  and  decomposition  codes.  They  concluded  that 
standard  LP  is  more  efficient  when  it  is  practical,  but 
pointed  out  that  decomposition  is  nearly  as  efficient  for 
the  large  problems  tested.  Ho  (1984b)  further  speculates 
that  efficiency  of  the  decomposition  increases  with  problem 
dimension. 

Simplicial  decomposition,  an  indirect  price-directive 
method  applicable  to  non-linear  quasi-convex  programs  was 
introduced  by  von  Hohenbalken  (1977).  Convergence  of  a 
restricted  version  of  simplicial  decomposition  was  recently 
demonstrated  by  Hearn  et  al.,  (1984). 

We  now  turn  to  resource-direction,  which  solves  the  MCTP 
using  the  subproblem 
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RS(y)  V(y)  =  min  ex 

st   Nx  =  b2 
0  <  X  <  y 

and  the  master  problem 

RD  min  V(y) 

st   Ay   =  b]^ 

yp  <  hi      for  all  p  -  P 

y   >  0  . 


Because  the  master  problem  provides  allocations  feasible  in 
MCTP,  RS(y)  produces  an  upper  bound  on  MCTP.  When  RS(y)  has 
a  feasible  solution,  it  is  feasible  in  MCTP.  It  is  a  simple 
matter  for  the  modeller  to  introduce  explicit  artificial  and 
slack  arcs  with  penalty  costs  to  insure  that  any  allocation 
will  produce  a  feasible  solution  to  MCTP. 

Resource-directive  algorithms  proceed  by  iteratively 
solving  the  restriction,  and  then  updating  the  allocations 
in  some  favorable  manner  and  resolving.  Geoff rion  (1970) 
provides  a  good  overview  of  the  resource-directive  approach. 
The  predominant  method  for  computing  new  allocations  is  via 
subgradient  directions,  which  are  a  generalization  of  the 
gradient  for  convex  or  concave  functions  with 
nondif ferentiable  points.  This  method  is  applicable  since 
V(y)  is  a  piece-wise  linear,  convex  function. 
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The  subgradient  approach  appeared  first  for  solving  sets 
of  linear  inequalities  (Agmon,  1954;  Motzkin  and  Schoenberg, 
1954) ,  and  then  appeared  in  the  Russian  literature  adapted 
to  optimization  problems,  e.g.,  Polyak  (1967).  The 
optimization  version  was  first  applied  in  the  Western 
literature  to  the  travelling  salesman  problem  by  Held  and 
Karp  (1970,1971),  and  further  developed  as  a  general  method 
for  optimization  by  Held,  Wolfe  and  Crowder  (1974).  The 
convergence  rate  and  stepsize  considerations  were  studied  by 
Goffin  (1977)  and  Bazaraa  and  Sherali  (1981) .  Fisher  (1985) 
summarized  the  subgradient  approach  to  solving  the 
Lagrangean  dual . 

Resource-directive  algorithms  for  the  MCTP  using 
subgradients  in  the  direction-finding  process  for  optimizing 
RD  were  developed  by  Kennington  and  Shalaby  (1977), 
Rosenthal  (1983),  and  Allen  (1985).  Ali,  Helgason, 
Kennington,  and  Lall  (1980)  declare  their  resource-directive 
algorithm  to  be  faster  than  either  a  price-directive  or  a 
special  basis  factorization  code  in  their  computational 
tests. 

While  subgradient  resource-directive  methods  are 
considered  exact  (Kennington  and  Helgason,  1980) ,  zigzagging 
in  subgradient  methods  is  known  to  be  a  problem  (Sandi, 
1979) ,  so  convergence  to  optimality  is  frequently  slow  at 
best.  Thus,  the  reported  computational  advantage  of 
resource  direction  using  subgradient  updates  is  based  on  its 
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purported  ability  to  reach  an  acceptable  near-optimal  point 
before  simplex-based  methods  reach  optimality  (Ali  et  al., 
1978,  1980;  Kennington  and  Shalaby,  1977). 

The  final  major  category  of  algorithms  for  the  MCTP  is 
basis  factorization  or  compact  inverse  methods.  Algorithms 
in  this  category  employ  a  primal  or  dual  simplex  approach, 
but  exploit  the  special  structure  of  the  LP  basis  for  the 
MCTP  to  reduce  the  size  of  the  basis  inverse  that  must  be 
carried  explicitly.  A  dual  partitioning  method  was 
presented  by  Grigoriadis  and  White  (1972).  Graves  and 
McBride  (1976)  discussed  the  primal  approach  in  the  general 
context  of  mathematical  programming.  Hartman  and  Lasdon 
(1972)  presented  a  theoretical  primal  approach  based  on  the 
Generalized  Upper  Bounding  (GUB)  technique  of  Dantzig  and 
Van  Slyke  (1967).  Incorporating  graph  theory,  they  factored 
the  MCTP  basis  into  a  network  basis  for  each  product  and  a 
"working  basis"  with  dimension  equal  to  the  number  of 
currently  binding  joint  capacity  constraints.  Several 
efficiencies  result.  Primal  network  simplex  techniques 
apply  to  computation  on  pure  network  bases.  Only 
computations  involving  the  working  basis  require  real 
arithmetic,  and  this  basis  is  generally  quite  small. 

Kennington  (1977)  implemented  the  method  and  further 
study  has  been  done  by  Helgason  and  Kennington  (1977).  Ali, 
Harnett  et  al.,  (1984)  concluded  that  the  GUB  approach  is 
three  to  five  times  faster  than  standard  LP  codes,  but  other 
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studies  general!,  rate  resource-airective  algorithms 
superior  to  the  GUB  approach  (see,  for  instance,  Ali,  et 
al.,  1980,  and  Kennington  and  Shalaby,  1977). 
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II.   A  RESOURCE-DIRECTIVE  APPROACH  TO  THE  MCTP 

This  chapter  presents  a  resource-directive  method  for 
solving  the  MCTP  which  incorporates  a  simplified  projection 
mechanism  in  the  primal  subgradient  capacity  reallocations 
and  conjugate  subgradient  directions  with  approximate  line 
search  in  a  Lagrangean  lower-bounding  routine.  The 
procedure  is  based  on  the  method  of  Allen  (1985)  which 
solves  two  essentially  independent  sequences  of  problems, 
the  resource-directive  sequence  to  generate  feasible 
solutions  and  upper  bounds,  and  a  Lagrangean  sequence  to 
generate  lower  bounds.  The  purposes  for  these  enhancements 
to  Allen's  procedure  are  to  provide  a  computationally- 
supportable,  theoretically  sound  projection  in  the 
reallocation  routine,  to  provide  a  better  mechanism  for 
termination  of  the  procedure,  and  to  attempt  to  generate 
ascent  directions  in  the  Lagrangean  routine  to  make  line 
search  worthwhile.  We  first  restate  the  general  form  of  the 
problem  to  be  solved,  and  then  introduce  the  concept  and 
required  theory  of  subgradient  optimization.  At  that  point 
we  will  be  able  to  state  clearly  the  shortcomings  of 
previous  approaches  and  present  the  procedure  based  on  our 
improvements . 

The  resource  directive  approach  attempts  to  solve  the 
problem 
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min  min  ex  (2 . 1) 

y   X 

St        Ax  =  bi  (2.2) 

NX  =  b2  (2.3) 

0  <  Yp  <  b^,  for  all  p  e  P          (2.4) 

x-y  <  0  (2.5) 

X  >  0  .  (2.6) 

By  selecting  a  particular  y  e  Y,  i.e.,  a  set  of  capacity 

allocations   satisfying   (2.2)  and   (2.4),   the   remaining 
subproblem  is 


v(y)    = 

=  min   ex 

(duals) 

St      NX   =   b2 

(U2) 

X   <   y 

(U3) 

X   >    0    , 

which  is  a  set  of  restricted,  single-product  transshipment 
problems.  The  original  problem  is  then  {min  V(y)  st  y  t  Y}, 
so  the  outer  minimization  forms  a  master  problem  which  seeks 
improving  capacity  allocations. 

Before  turning  to  the  subgradient  approach  to  solving 
this  problem,  we  briefly  mention  another  approach  to  this 
problem,  "Benders  decomposition."  First,  we  make  the 
simplifying  assumption  that  all  problems  have  finite 
feasible  solutions  and  recall  the  theorem  of  duality  to 
establish  that  the  optimal  values  of  a  linear  program  and 
its  dual  are  equal.   Now  the  dual   of   our  subproblem   is 
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V(y)  =  max  usbs  -  U3y  (2.7) 

St    U2N   -  U3I   <  c 

U2  free,  U3  >  0  , 

and  letting  the  constraint  set  be  represented  by  its  extreme 
points,  {^2e)'  {^se)'  where  e  e  E,  the  index  set  of  extreme 
points,  it  may  be  rewritten 


max   U2eb2  -  U3ey  . 

e 


Substituting  for  V(y)  in  the  master  problem  yields 


{min  max(U2eb2  ~  ^Se^)  st  y  e  Y}  , 

e 


or,  equivalently , 

min   z 

st    z  >  U2eb2  -  U3ey    e  «^  E 

Ay  =  bi 

0  <  y  <  b^  . 

This  is  the  Benders  master  problem.  In  practice,  the 
extreme  points  of  the  subproblem  are  not  all  known,  so  we 
generate  them  by  iteratively  solving  the  subproblems  to 
produce  a  new  extreme  point,  and  then  add  it  to  the  master 
problem  in  the  form  of  a  constraint.  The  master,  in  turn, 
is  solved  to  produce  a  new  allocation.  This  is  the  process 
of  the  Benders  (1962)  decomposition  algorithm  which  treats  y 
as  a  complicating  variable.    A  full  presentation  of  this 
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method,  which  also  considers  conditions  of  unboundedness , 
may  be  found  in  Lasdon  (1970) .  The  approach  is  not  pursued 
here  because  of  the  large  size  of  the  Benders  master 
problem. 

The  subgradient  approach  attempts  to  solve  the  master 
problem  by  simple  updates  of  the  capacity  allocations  at 
each  iteration,  y^+l  =  Pr[y^  +  s^d^] ,  where  Pr  indicates  a 
projection  such  that  y^"*"!  a  Y,  d^  is  a  subgradient  and  s^  is 
from  a  scalar  step  sequence,  {s^}  satisfying 

I   s^  =   °°,      s^  >  0,   and   s^  ^  0  as  k  ^  °° .       (2.8) 

Polyak  (1967)  demonstrated  that  these  projected  updates 
converge  in  the  limit  to  an  optimal  solution  simply  by 
assuring  that  (2.8)  is  met.  Letting  y^"*"l  =  yk  +  s^d^,  the 
projection  finds  a  feasible  y^"*"!  =  argmin{  |  |  y-y^"*"-^  |  |  ^  st 
y  e    Y}. 

The  subgradient  itself  is  a  generalization  of  the 
gradient  for  a  function  with  nondif ferentiable  points.  If  f 
is  a  convex  function  with  nondif ferentiable  points  defined 
on  a  feasible  region  F,  then  d  is  a  subgradient  of  f  at 
z"  e  F  if 

f(z)  >  f(z')  +  d'(z-z)   for  all  z  e  F  . 

The  set  of  subgradients  at  a  point  is  called  the 
subdif ferential ,  defined  by 

8f(z)  =  (d|f(z)  >  f(z")  +  d'(z-z')  for  all  z  ^  F)    ; 
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when  z  is  a  dif ferentiable  point  of  f,  8f(z)  =  7f(z). 
Furthermore,  there  exists  a  set  of  subgradients  which  are 
the  limits  of  the  one-sided  directional  derivatives  at  z. 
These  particular  subgradients,  referred  to  as  "primitive 
subgradients,"  may  be  used  in  convex  combinations  to 
generate  any  member  of  the  subdif ferential .  This 
presentation  is  made  in  greater  detail  by  Rockafellar  (1970) 
and  Sandi  (1979) . 

Demjanov  (1968)  demonstrated  that  at  any  nondif ferenti- 
able point  of  a  convex  function  f,  there  exists  a 
directional  derivative  -d^  =  f'(z)  e  9f(z)  which  is  the 
locally  best  direction  of  descent.  That  direction  is  the 
minimum  norm  of  the  subdif ferential ;  that  is, 

d^     =     argmin  ( | | g | | ^  st  g  e    3f ( z ) } 

which  is  the  point  of  the  subdif  ferential  closest  to  the 
origin.  Furthermore,  a  point  z*  is  an  unconstrained  optimum 
if  and  only  if  -djn  =  f'(z*)  =  0. 

Although  using  the  minimum  norm  may  be  attractive  as  a 
descent  direction,  the  required  primitive  subgradients  are 
usually  not  all  known;  the  usual  approach  is  to  find  a 
single  subgradient  at  each  iteration  and  update  the 
allocations  using  it  along  with  a  step  sequence  satisfying 
(2.8).  The  method  works  because  any  element  of  the  current 
subdif ferential  forms  an  acute  angle  with  the  true  direction 
to  the  optimal  point,  so  by  taking  a  step  of  the  appropriate 
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length,  we  move  pointwise  closer  to  the  optimal  solution  at 
each  iteration.  However,  it  is  not  necessary  for  the 
objective  function  value  to  improve  at  each  iteration.  The 
sequence  achieves  an  optimal  point  when  a  zero-subgradient 
is  encountered  (Held,  Wolfe,  Crowder,  (1974)). 

The  method  is  attractive  due  to  its  extreme  simplicity 
and  is  commonly  used  as  a  mechanism  for  multiplier 
adjustment  in  Lagrangean  relaxations  (e.g.,  Fisher,  1985) 
and  for  capacity  reallocation  in  resource-directive 
algorithms  for  the  MCTP  (see  Kennington  and  Shalaby  (1977) , 
Rosenthal  (1983),  or  Allen  (1985)). 

The  choice  of  stepsize  sequence  is  critical  to  the 
success  of  subgradient  methods.  For  instance,  the  harmonic 
series,  s^  =  1/k,  satisfies  (2.8)  but  exhibits  poor 
convergence  in  practice  (Bazaraa  and  Sherali,  1981) . 
Convergence  has  also  been  demonstrated  for 


.k  _ 


n(v*-f(zk))/| |gk| |2 


where  0  <  n  <  2  is  a  scalar  multiplier,  v*  is  the  optimal 
objective  function  value,  and  g^  is  the  norm  of  the 
current  subgradient  (Polyak  (1967)).  Since  v*  is  generally 
not  known,  several  researchers  (for  example,  Kennington  and 
Shalaby  (1977)  and  Bazaraa  and  Sherali  (1981)  have  replaced 
it  by  approximating  values  such  as  upper  bounds  or 
combinations  of  bounds.  Coffin  (1977)  discusses  a  geometric 
stepsize  progression  which  exhibits  quadratic  convergence. 
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but  may  not  achieve  an  optimal  point.  Some  good  experiences 
have  been  reported  for  these  methods,  but  these  "heuristic" 
stepsizes  frequently  produce  poor  convergence,  or,  worse, 
may  converge  to  non-optimal  solutions.  It  is  common 
practice  to  include  in  the  process  a  heuristic  rule  for 
modifying  the  value  of  n  when  progress  is  slow. 

We  now  present  the  specific  application  of  the 
subgradient  approach  to  the  MCTP,  first  to  the  resource 
direction  routine  in  Section  A,  then  to  the  Lagrangean 
routine  in  Section  B,  and  finally  present  the  overall 
procedure  in  Section  C. 
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A.   GENERATING  UPPER  BOUNDS  VIA  RESOURCE  DIRECTION 

We  use  the  subgradient  approach  in  resource  direction  to 
perform  the  update,  y^"*"^  =  Pr[y^  +  s^d^]  for  yk,yk+l  ^  y,  to 
solve  the  problem  min  V(y)  st  y  e  Y  where 

y 

V(y)  =  min  ex 

St   NX  =  b2 
X  <  y 
X  >  0  . 

Notice  that  changes  in  capacity  allocations  act  as 
parametric  changes  to  the  right-hand  side  of  the  subproblem, 
so  V(y)  is  convex  with  respect  to  changes  in  y:  a  proof  of 
this  is  given  in  Kennington  and  Helgason  (1980) . 

To  identify  an  appropriate  subgradient,  we  first  recall 
(2.7).  For  any  allocation  y  producing  a  feasible  solution, 
we  have  V(y)  =  U2b2  -  U3y,  where  U2  and  U3  solve  the  dual 
program.  The  following  proof  from  Kennington  and  Helgason 
(1980)  shows  that  -U3  is  a  subgradient  of  the  function  V  at 

y: 

Lemma  2.1:  Let  y  >  0  be  any  feasible  allocation  to  V(y) 
and  (U2,U3)  be  the  associated  optimal  dual 
solution.   Then  -U3  is  a  subgradient  of  V  at 

y- 

Proof:   Let  y,   y  e  Y  be  any  feasible  allocations,   with 
optimal  dual  solutions   (U2,U3),   (U2,U3)   in  V(y)  , 
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V(y) ,  respectively.   Then 

V(y)  -  V(y)  =  U2b2  -  U3y  -  (Usbs  -  U3y)  > 

\i2h2    -   U3y  -  (U2b2-U3y)  =  -U3  (y-y)  ,  or 

V(y)  >  V(y)  -  U3  (y-y)  so  -U3  is  a  subgradient  of  V 

at  y.   QED. 

Therefore,  a  subgradient  is  directly  available  from  the 
subproblem  at  each  iteration  as  the  negative  of  the  optimal 
dual  U3 ,  for  allocation  y. 

If  -U3  were  a  feasible  direction,  the  subgradient 
reallocation  would  be  y^"*"l  =  y^-s^U3^.  However,  since  -U3 
generally  yields  infeasible  allocations,  we  project  the 
infeasible  reallocation  y  =  y'^-s'^U3'^  to  a  feasible 
allocation,  y'^''"^  by  solving  the  quadratic  program 

min  I  I  (y^+l-y) I  1^ 
St  yk+1  .  Y 

An  equivalent  form  of  this  problem  is 

min  I       (ypj^^^-ypj)2   st   yJ^^l  .  Y  , 
P  J 

which  decomposes  on  j .  Solving  a  quadratic  program  for  each 
jointly  constrained  arc  at  each  iteration  is  a  burdensome 
task,  so  a  heuristic  projection  is  generally  used  which  does 
not  involve  the  quadratic  functions. 
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Unfortunately,  computational  results  of  Allen  (1985), 
for  instance,  indicate  that  the  method  is  unable  to  obtain 
near-optimal  solutions  on  even  moderate-sized  problems,  even 
when  the  bulk  of  the  solution  time  is  spent  on  the  primal 
reallocations. 

Two  factors  seem  to  contribute  to  this  failure.  First, 
the  nature  of  the  subgradient  itself  is  that  no  primitive 
element  of  the  subdif ferential  is  guaranteed  to  be  an 
improving  direction.  Second,  the  space  of  the  primal 
variables  is  relatively  large,  complicating  the  problem. 

That  is,  if  Jrp  =  {jiy  ^pj*  =  ^ij  ^*  optimal  in  MCTP}  ,  then 

P 
in  applying  the  subgradient  approach  to  the  Lagrangean 

problem,   we  work  with  0(|Jr[t|)   dual   variables   at   each 

iteration;  in  the  primal  resource  allocation,  we  make 

0  (  I  Jrjn  I  •  I  P|  )  reallocations.   Thus,  as  the  number  of  products 

grows,  the  master  problem  becomes  substantially  harder. 

We   implement   a   projection   method   which   ignores 

0   <   Yp   <   h^     and  maintains  the  bounds  externally  by,   in 

essence,   a   simple  minimum  ratio  test.     The  resulting 

projection  can  be   solved  analytically,   and  applied   in 

practice  through  a  simple  set  of  calculations.   Therefore, 

although   it  does  not   relieve  the  previously  mentioned 

shortcomings  of  the  subgradient  method,  it  does  not  add  to 

them.   Furthermore,  it  preserves  theoretical  convergence  of 

the  method  for  a  stepsize  sequence  satisfying  (2.8). 
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If  we  let  y^  and  U3^  be  the  allocation  and  dual 
variables  obtained  at  iteration  k,  the  infeasible 
reallocation  y  and  the  feasible  (projected)  allocation 
y^"*"!,  then  we  may  express  the  last  two  allocations  as 

y   =   yk  _  s^U3^ 


and 


yk+1   —  yk  _  3^113'^  . 

The  resulting  quadratic  program  is 

min  I  ly^'^^-yl  1^ 
St  Ay^"^l  =  hi    . 

Recognizing  that  yk+l_y  =  -s^U3^  +  s^U3^,  that  Ay^  =  b^^,  and 
that  s^  is  merely  the  scale  factor,  an  equivalent  problem  is 

min  I |U3^-U3^| | ^ 
St   AU3^  =  0  . 

For  a  single  constraint  j,  dropping  the  iteration  count, 
the  resulting  problem  is 


/\    /\ 


min   U3j'u3j  -  U3j'U3j 
St   I'^aj  =  0  . 

The  Kuhn-Tucker  conditions  for  this  problem  are 

U3J  -  U3J  -  1-w  =  0  (2.9) 
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where  w  is  the  dual  variable  associated  with  the  constraint. 

-)-  -^ 

Premultiplying  by  1',  recognizing  that  l'u3j  =  0  and  solving 

we  find  w  =  - (l ' 1) ~^1 'U3 j .    Substituting  into  (2.7)   and 

rearranging,  the  familiar  projection 

->-  ^  ^   ->- 
U3j  =  (I  -  l(l'l)-ll')U3j 

-^  -> 
is  obtained.   Observing  that  (I'l)"^  =  1/|P|  and  rewriting 

to  summation  form,  we  see  that  the  projected  subgradient  for 

the  p-th  product  on  arc  j  is 

^3pj  =  [(|P|-l)U3pj/|P|]  -   y    U3p,j/|P|  =  U3pj-U3j 

PV^P 


where 


U3j  =  I   U3pj/|P| 
P 

The  associated  reallocation  update  is  Yp^"*"-^  =  Yp^  - 
s^(U3p-U3),  where  s^  is  from  an  appropriate  stepsize 
sequence,  { s^} . 

Since  the  bounding  constraints  were  ignored,  Ypj  <  0  or 
Ypj  >  Uj  may  result.  In  this  situation  we  perform  a  minimum 
ratio  test,  setting  0  <  Sj  '  <  Sj^  such  that  0  <  ypj  <  b-^j 
for  all  p  on  arc  j .  We  drop  all  products  with  Xpj  =  0  and 
U3pj  <  U3 j  on  arc  j ,  recomputing  U3 j  and  reallocating  among 
the  remaining  products  to  simplify  the  process 
computationally.   In  addition,  if  some  Xp  1  j  =  bj  with  U3pij 


=  min(U3pj),  no  reallocation  is  performed  for  that  arc 
P 
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Finally,  if  ^spj  =  ^2j     for  all  p  on  some  arc  j,  then  no 
reallocation  occurs  on  that  arc. 

We  write  the  procedure  as  follows,  given  x^,  U3^  from 
the  current  subproblem,  and  a  stepsize  s^: 


REALLOT: 


Compute  U3j^  =  (AjU3)/|P| 

{ For  each  arc,  j : 

Cond  1:   If  there  is  a  p'  with  Xp i j 

and  U3pi j  =  min(U3pj) , 

P 


=  b 


Ij 


let  YpjJ^^l  =  ypjk 


Cond  2 


Cond  3 


If  u 


3pj 


=  u 


3j  for  all  p, 


let  Ypjk+l  =  ypjJ^ 

Identify  P^  =  (P|Xpj  =  0,U3pj  >  U3 j ) 

If  Pjg  7^  p,  recompute 

^3j  =    I       ^3pj/(|P|-iPNl) 

p/Pn 

Over  all  p  /  ^N 


set  ypj^"^^  =  ypj^  -  s-^ 


such  that  0  <  s-;^  <  s*^  and 
0  <  Ypj^'*'^  <  ^ij    fo^  ^11  P 


:  (^3pj-U3j) 
k 


End  for} 


Subgradient  reallocation  need  only  be  performed  for  the 
set  of  joint  arcs  which  are  currently  tight.  For  all  other 
arcs,  if  there  is  some  Xpj=  yp-;  with  favorable  reduced 
costs,  but  AjX  <  b]^j  for  that  j,  we  perform  a  simple 
reallocation  in  the  manner  of  Rosenthal  (1983),  taking  slack 
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from  any  p'  with  Xp i j  <  Yn • j  until  it  is  all  used.  Thus,  in 
the  solution  process,  all  available  slack  on  a  joint  arc  j 
is  offered  to  each  product  until  it  is  consumed,  at  which 
point  that  arc  becomes  a  candidate  for  subgradient 
reallocation. 

In  practice,  we  first  perform  simple  reallocations  until 
no  further  improved  solutions  are  found,  and  then 
incorporate  subgradient  reallocations  into  the  ensuing 
iterations  for  tight  joint  constraints.  The  set  of  tight 
constraints  is  updated  with  each  pass  through  the 
subproblems.  We  note  that  if  at  any  iteration  k,  there  is 
no  arc  j  to  which  a  simple  reallocation  may  be  applied  and 
all  arcs  with  AjX  =  h-^j  also  have  U3pj  =  U3J  for  all  p,  we 
have  achieved  an  optimal  point  (Rosenthal,  1983). 

Unfortunately,  the  optimality  condition  is  not 
frequently  achieved,  nor  is  a  useful  lower  bound  available 
from  the  dual  information  of  the  restricted  problems,  so  we 
establish  lower  bounds  on  MCTP  through  a  Lagrangean 
relaxation  routine  in  hopes  of  terminating  through  bound 
convergence.  The  Lagrangean  approach  is  explained  in  the 
following  section. 
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B.   LAGRANGEAN  RELAXATION  USING  CONJUGATE  SUBGRADIENT 
DIRECTIONS 

Lagrangean  relaxation  is  a  frequently-used  device  to 

assist   in   solving   linear  programs   in  which   a   set   of 

complicating   constraints   are  placed   into  the   objective 

function  with  penalty  multipliers  which  are  estimates  of  the 

optimal  dual  multipliers  to  the  original  problem.   Recalling 

Chapter  I,  the  problem  becomes 

LR  max  min  ex  -  UQ^(Ax-b2^)   st   x  e  F, 

Ul<0  x>0 

which  is  frequently  solved  by  fixing  Uj  <  0  and  solving 

LR(Ui)        Y(U3^)  =  min(c-UiA)x  +  u^b]^ 

st   X  .:  F  , 

to  yield  a  lower  bound  on  the  original  problem.  A  new  U]^  is 
then  found  and  LR(u;]l)  resolved,  repeating  until  optimality 
is  achieved. 

A  common  method  of  accomplishing  the  u-^  updates  is  by 
subgradient  optimization,  where  g  =  (Ax-b]^)  is  a  subgradient 
and 

Uq^^"*"^  =  min(0,U]^^-s^g^) 

is  the  update.  The  "min"  operator  solves  the  projection 
mini  I  U]^'^"*"^-U2^  I  \^  st  U]^^"*"-^  <  0  (see  for  instance,  Fisher, 
1985)  .  However,  the  method  has  several  drawbacks.  First, 
due  to  the  nature  of  the  subgradient,  the  bound  is  not 
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always  improved,  and  due  to  the  sensitivity  of  the  process 
to  choice  of  s^,  convergence  may  be  slow,  or  the  process  may 
converge  to  a  non-optimal  point.  Due  to  the  relaxation,  it 
is  not  necessary  that  a  primal  feasible  solution  ever  be 
generated  and,  since  we  only  determine  one  primitive 
subgradient  at  each  iteration,  we  are  unlikely  to  find  a  0- 
subgradient  and  thus  recognize  an  optimal  solution. 

In  this  section  we  introduce  a  procedure  which  seeks  to 
overcome  some  of  these  difficulties  by  retaining  information 
on  previous  subgradient  directions  in  order  to  approximate 
the  subdif ferential  at  the  current  point.  The  concept, 
developed  originally  by  Wolfe  (1975)  involves  computing  a 
"conjugate  subgradient"  direction  at  each  iteration  which  is 
the  nearest  point  to  the  origin  of  a  set  of  m  retained 
subgradients.  The  method  relies  on  the  property  that  the 
subdif ferential  of  any  point  can  be  arbitrarily  well 
approximated  by  the  convex  hull  of  gradients  of  the  points 
in  its  immediate  neighborhood  (see  Rockafellar,  1970,  for 
details) .  The  algorithm  generates  a  sequence  of  points 
(U]^^}  and  subgradients  (g^}  and  computes  a  search  direction 
d^  as  the  solution  to 

(CD)  min    I  |d|  |2 

m 
st    I        Wj^g^"^  =  d 
n=0 

m 

I        wn  =  1 
n=0 

w   >  0  for  n  =  l,m 
n  — 
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for  some  integer  m.  At  some  step  of  the  algorithm,  some 
subsequence  of  points,  (Ui^^Ui'^"-^,  .  .  .  ,U2^'^~^)  is  close  enough 
together  that 

9f(z^)  =  conv(g^,g^"^,  .  .  .  ,g^""^) 

and 

d^  =  0  . 

When  this  occurs,  we  are  sufficiently  close  to  the  optimal 
solution  to  terminate  with  U]^^  as  an  approximation  for  u^^*. 

Wolfe's  algorithm  calls  for  a  single  variable  search  to 
optimize  the  objective  in  the  conjugate  subgradient 
direction;  this  is  computationally  expensive  since  the  inner 
minimization  involves  solving  the  set  of  network  subproblems 
for  various  step  lengths.  So,  we  find  s^  approximately. 
Define  a  nominal  stepsize  s^   as 

Sn  =  n(V-V)/|  Igl^l  | 

where  I  Ig^l  I  is  the  Euclidean  norm  of  the  current 
subgradient,  and  n  is  a  scalar  multiplier.  V  is  the  current 
upper  bound,  and  V  is  the  current  lower  bound.  Then  we 
evaluate  W(n)  =  V(ui^  +  s^^d^)  for  n  =  0,1,  and  2.  Since  the 
objective  function  value  of  the  dual  of  a  linear  program  is 
piecewise-linear  and  concave  as  a  function  of  the  multiplier 
values  (Bazarra  and  Shetty  (1979))  this  surface  can  be 
approximated  by  fitting  a  quadratic  interpolating  function 
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through  these  three  observations.  Using  the  Newton-Gregory 
forward  equation  to  interpolate  on  equally  spaced  data,  we 
calculate 


W(n)  =  W(0)  +  nWl  +  .5n(n-l)W2  (2.10) 

for  0  <  n  <  2 ,  where 

Wl  =  W(l)  -  W(0) 
and 

W2  =  W(2)  -  2W(1)  -  W(0)  . 

Details  may  be  found  in  Gerald  and  Wheatley  (1984),  for 
example.   To  compute  the  appropriate  step  length,  we  select 

n^  =  argmax  (W(n) |0  <  n  <  2) 

and  set  s^  =  n^(V-V)/ 1  |  g^|  |  .    The  resulting  multiplier 
update  is  Uq^^"^-^  =  max(0,U3^^+s^d^)  ,  and  we  solve  for  V(U2^^''"-^) 
to  complete  the  approximate  line  search. 

Solving  LR(U]^^''"^)  provides  a  new  subgradient  g^"*"^  = 
(Ax^"'"-'--b]^)  "^  which  is  added  to  the  retained  set.  We  are  then 
ready  to  compute  a  new  conjugate  direction  using  (CD) . 

Since  we  are  approximating  the  subdif ferential,  d'^  will 
not  always  be  an  ascent  direction.  In  that  case,  we  simply 
take  a  small  step  in  the  direction  d^,  generate  g'^'''^  to 
improve  our  local  approximation  of  the  subdif ferential , 
compute  a  new  d^"'"-'-,  and  continue. 
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When  d'^  is  near  zero,  we  must  insure  that  the  sequence 

of  U]^'s  generated  is  suitably  close  together  to  adequately 

approximate  the  subdif ferential  at  u^^^.   Wolfe  suggests  this 

m 
amounts  to  the  check  I        Muq^^"^  -  Ui^""^l||  <  M  for  some 

n=l 
M  >  0.   If  this  condition  holds,  U]^^  is  accepted  as  near 

optimal;  if  not,  all  retained  subgradients  are  dropped  and 

the  process  is  restarted  from  u.^^^. 

It  is  possible  to  produce  feasible  capacity  allocations 

from  Lagrangean  solutions,  which  may  be  used  as  starting 

points  for  resource-directive  procedures.   These  allocations 

must  be   integer  to  preserve   integer  arithmetic   in  the 

network  subproblems.    Given  a  current  value  for  x,   the 

capacity  allocation  y  may  be  computed  by 

(CA)         /  Pr[Xpj-(bij/[  Xpj)]         if  j  .  Jv     (2.11) 

[    Pr[Xpj  +  (V  Xpj-bij)/|P|]   if  j  /  Jv 
P 

where  Pr[*]  indicates  a  projection  onto  the  set  of  integers 

satisfying  ^    ypj  =  Uj  and  ypj  >  0,  integer  for  all  p  and 

P 
j  ^-  J.    In  practice,  simple  rounding  is  sufficient. 

Note  that  the  conjugate  subgradient  approach  is  also 

applicable  to  the  resource-direction  problem,  but  we  chose 

not  to  use  the  conjugate  approach  due  to  excessive  storage 

requirements.    While  for  the  Lagrangean  we  must  store  m 

subgradient  vectors,   in  the  resource-directive  problem  we 

must  store  m*|P|  vectors. 
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C.   A  RESOURCE  DIRECTIVE  PROCEDURE  WITH  LAGRANGEAN 
LOWER  BOUNDING 

We  now  present  a  procedure  combining  the  resource 
direction  of  Section  A  with  the  Lagrangean  routine  presented 
in  Section  B  for  solving  the  MCTP.  Upper  bounds  and 
feasible  solutions  are  obtained  via  the  resource  direction, 
while  lower  bounds  are  obtained  from  the  Lagrangean  routine. 
Although  we  present  a  specific  stepwise  arrangement,  the 
resource  directive  and  Lagrangean  routines  are  coupled  only 
by  the  values  of  the  bounds.  They  may  be  performed 
sequentially,  iteratively,  or  in  parallel,  exchanging  bound 
information  when  appropriate. 

Define  the  stopping  criterion  e  >  0  as  the  allowable 
relative  error  between  bounds,  D  >  0  as  the  acceptance 
criterion  for  a  near-zero  direction,  and  U  >  0  as  the 
allowable  Euclidean  separation  of  points  used  to  approximate 
the  current  subdif  ferential .  Let  V  and  V  be  the  current 
bounds  and  x  be  the  incumbent  solution.  Also  let  K  be  the 
maximum  allowable  number  of  iterations. 

The  Algorithm  RDLB  follows: 
Input:    The  network  T  =  {I, J},  and  joint  capacity  vector 
b^,  and  for  each  product  p  =  1,...,|P|,  a  cost 
vector  Cp,  and  a  supply/demand  vector  b2p 
Output:   Incumbent  solution  x,  and  incumbent  value  ex. 

step  0  (Initialize):   Select  e  >  0,  D  >  0,  U  >  0,  m  >  1, 

K  >  1  integer,  set  k=0,  u°  =  0,  J^  =  (^ 
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Solve  LR(0)  ,  compute  J^  =  (jIAjX^-b^^j  >  0}, 
Evaluate  Y(0)  obtaining  x° 
If  J^  ?^  0,  stop  with  x°  optimal  in  MCTP 
Else,  V  =  V(0) 

set  d°  =  g°  =  (Ax°-bi)+ 

Set  y°  =  CA(x°) 

Solve  RS(y°)  with  Xy*  optimal,  set  V  =  V(y°) , 

X  =  Xy* 
If  (V  -  V)/V  <  e,  exit. 

step  1:   Solve  Lagrangean 

la  (Line  Search):   Compute  s  =  n(V-V)/ | | g^| | ,  solve 
W(n)  for  n  =  1,2 
Set  n^  =  argmax{  W(n) ,  0  <  n  <  2} 


s 


k  _ 


=  n^(V-V)/| \q^\   I 


lb   (Move  to  new  point) : 

Set  Ui^"*"^  =  max(0,U;L^  ^   s^d^) 

Solve  LR(Ui^"*"l) 

If  Y(ui^'^l)  >  V,  V  =  V(ui^"*"^) 

if  (V-V)/V  <  e,  exit  with  incumbent  x 

Else,  compute  subgradient  g^"*"-^  where 


g.k+l    = 


Ajx^-b^j    for  j  t  J, 


I. 


otherwise 


set  Jv  =  Jv    {J|AjX-bij  >  0} 
set  G^+1  =  (gJ^+l,...,gJ^-"»+l) 
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Ic   (Compute  New  Direction) : 

Let  d^''"^  =  argmin(  |  |d|  |2  st  d  e   conv(G^+l)  ) 

>  D,  set  k  =  k+1,  go  to  step  1. 

<  D  and  \    \   |  Ui^"^-Ui^""""*"l  |  |  <  u, 
n=0 

k  =  1,  go  to  step  2 


If  I |d^+l| 


<  D  and  \    \   | Ui^-^-Ui^~^+l | |  >  u, 
n=0 

set  dP   =  g^+1, 

ui°  =  M-^,    k  =  0,  G°  =  G^, 

go  to  step  1. 


step  2  (Resource  Direction) : 

2a.   Solve  RS(y)  for  the  current  y^ 


If  V(y)  <  V,  V  =  V(y) ,  X  =  X  * 


If  (V-V)/V  <  e,  exit  with  x 
Else  let  Jrp  =  (jIAjXj^-b^j  =  0} 
2b.   (Reallocate) 

For  all  j  e  J-p,  perform  the  procedure  REALLOT 
to  obtain  Ypj^"^^  =  Ypj^  -  Sj^(U3pj^-U3 j^) 

If  Ypj^''"''"  =  Ypj^  f°^  ^11  3  ^   JT'  exit 
If  k  >  K,  exit 
Else,  go  to  step  21. 

Two  aspects  of  this  algorithm  represent  additional 
computational  burdens  over  a  standard  subgradient  approach 
which  must  be  justified.  First,  the  direction-finding  step 
requires  solving  a  quadratic  program.    This  is  a  small 
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program  of  m  variables  which  may  be  solved  efficiently  as  a 
linear  complementarity  problem  using  the  Kuhn-Tucker 
conditions  (Bazaraa  and  Shetty,  1979)  .  Since  the  purpose  of 
computing  conjugate  directions  is  to  reduce  the  zigzagging 
common  to  subgradient  algorithms,  the  potential  for  fewer 
iterations  justifies  the  effort. 

Second,  the  line  search  requires  two  additional 
subproblem  evaluations  at  each  iteration.  This  means,  in 
effect,  that  3  normal  subgradient  steps  can  be  taken  for 
each  conjugate  step  taken.  It  is  not  clear  that  this  effort 
will  always  be  justified.  An  alternate  form  of  the 
conjugate  subgradient  algorithm  which  overcomes  this  problem 
first  takes  a  series  of  subgradient  steps,  retaining  the 
subgradients,  and  then  periodically  performs  a  conjugate 
subgradient  step  with  quadratic  approximation.  This 
amortizes  the  cost  of  the  search  over  several  iterations. 
After  each  conjugate  step,  old  subgradients  are  dropped  and 
a  new  collection  begins.  Computational  results  presented  in 
Chapter  IV  show  that  the  conjugate  directions  method  does 
generate  a  direction  resulting  in  a  large  increase  to  the 
lower  bound  every  few  iterations,  but  is  rather  slow. 
However,  because  the  entire  procedure  performs  much  worse 
than  other  algorithms  tested,  the  alternate  forms  suggested 
above  have  not  been  tested. 
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III.   PRICE  DIRECTIVE  DECOMPOSITION  ALGORITHMS 

In  this  chapter  we  develop  the  theory  of  dual 
decomposition  and  introduce  a  new  decomposition  algorithm 
which  solves  a  penalized  version  of  the  original  linear 
problem.  This  is  a  procedure  which  uses  restricted 
simplicial  decomposition  to  construct  a  nonlinear  master 
problem  and  conveys  price  information  to  the  subproblems 
through  the  gradient  of  the  objective  function.  With  proper 
choice  of  penalty  parameters,  the  subproblems  quickly 
produce  good  Lagrangean  lower  bounds  on  the  original 
problem,  and  improving  primal  solutions  are  produced  by 
resource  direction  using  the  (infeasible)  penalized  master 
problem  solutions. 

Both  algorithms  are  applicable  to  general  linear 
programs  and  their  development  here  is  accordingly  general. 
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A.   THE  DUAL  DECOMPOSITION  ALGORITHM 

The  standard  approach  in  describing  the  dual 
decomposition  algorithm  is  to  put  forth  the  idea  of 
representing  the  feasible  region  of  a  subset  of  the 
constraints  of  an  LP  as  a  convex  combination  of  its  extreme 
points.  This  reduces  the  number  of  constraints,  but 
introduces  a  large  number  of  variables  (the  extreme  points 
of  the  dualized  constraints)  ,  so  the  problem  becomes  one  of 
finding  the  extreme  points  which,  in  convex  combination, 
describe  the  optimal  solution  to  the  original  problem. 
Dantzig-Wolfe  decomposition  combines  this  extreme  point 
representation  with  column  generation,  which  is  the  process 
of  generating  extreme  points  as  required. 

If  we  let  X  be  a  matrix  whose  columns  are  the  extreme 

points  of  F,  then  any  x  in  F  may  be  expressed  as 

-*■ 
x=Xw,  l-w=l,  w>0  , 

->■ 
where  I'w  =  1,  w  >  0  enforce  convex  combinations  of  the 

extreme  points. 

Substituting  this  form  yields  the  Dantzig-Wolfe  master 

problem,  at  the  k"^^  iteration 

(3.1) 
(3.2) 
(3.3) 
(3.4) 


min 

cX^w^ 

(duals) 

St 

(Ax^)w^   <   hi 

«1 

1-w^   =    1 

^o 

w^   >    0 
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where  X^  is  a  matrix  of  extreme  points  collected  so  far,  and 
(3.3)  and  (3.4)  are  the  convexity  constraints. 

Each  time  the  master  problem  is  solved,  we  check  to  see 
if  it  is  an  optimal  solution  by  computing  the  reduced  costs 
for  the  non-basic  extreme  points  of  F.  It  is  not  necessary 
to  check  them  all  since  the  most  negative  reduced  cost  is 
found  by  solving  the  problem 

min   (c-UiA)x  -  Uq  (3.5) 

st  X  e  F  , 

which  produces  an  extreme  point  of  F.  This  is  the  Dantzig- 
Wolfe  subproblem,  which  for  MCTP  is  a  set  of  single-product 
network  problems  which  are  easy  to  solve. 

If  x^  solves  the  subproblem  at  the  k"^^  iteration  and 
(c-UiA)x^-Uq  <  0,  then  the  reduced  cost  associated  with  x^ 
is  favorable,  so  x^  is  added  to  the  collection  of  extreme 
points  and  we  return  to  the  master  problem.   If  (c-U2A)x^- 
u°  >  0,  then  the  reduced  cost  for  x^  i's  unfavorable.   Since, 
due  to  the  minimization,  it  has  the  minimum  reduced  cost, 
then  all  non-basic  extreme  points  of  F  have  unfavorable 
reduced  costs,  and  the  process  terminates  with  the  solution 
to  the  previous  master  problem  optimal  in  MCTP. 

A  second  approach  may  be  taken  to  the  dual  decomposition 
algorithm  which  considers  the  decomposition  as  a  cutting 
plane  process  (Kelley,  1960) ,  resulting  in  a  master  problem 
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which  is  dual  to  the  standard  master  problem  (e.g.,  Graves 
and  Van  Roy,  1979) . 

The  problem  to  be  solved  is 

(P)      min  ex  (dual  variables) 
St   Ax  <  hi  (ui) 

NX  =  b2        (U2) 
X  >  0 

whose  dual  is 

(D)       max   ub  =  u^^b]^  +  U2b2 

U]_A  +  U2N  <  c 
^1  <  0,  U2  free  . 

The  feasible  regions  associated  with  the  various  constraints 
are 

Fpi  =  (x|Ax  <  bi) 

Fp2  =  (xINx  =  b2} 

Fp  =  Fp3_  and  Fp2  and  (x|x  >  0} 

^Dl  ~  {^l^iA  +  U2N  <  c} 

^D   ^  ^Dl  ^"^  {i^l^l  <  0)- 
The  respective  values  of  the  primal  and  dual  problems  are 

written  V(P)  =  ex  and  V(D)  =  ub . 

Let  f(u,x)  =  ex  -  U]^(Ax-b2^),  which  we  recognize  as  the 

Lagrangean  of  (?)  with  respect  to  the  constraint  set  A. 

Lemma   3.1:   If  x  .  Fp2 ,   x   >  0,   u  e  Fp,   ^2  <  0,   then 
f (u, X)  >   ub. 
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Proof:   Write  the  identity: 

ub  =  ex  -  Ui(Ax-b]L)   "  U2(Nx-b2)   -   (c-U]^A-U2N)  x. 

Rearrange  and  substitute: 

ub  =  f(Ui,x)  -  U2(Nx-b2)  -  (C-U1A-U2N) x. 

But  the  two  right-most  terms  are  <  0,  giving  the 

desired  result.   QED. 

This   suggests   that   by   fixing  Ui,        the   following 
subproblem  results: 


(SUB(ui)) 


min   f(U]^,x)  =  u^^b]^  +  (c-U]_A)x 
St     NX  =  b2  (U2) 

X  >  0 


This  is  the  Lagrangean  relaxation  of  (P)  with  respect  to 
A,  with  u^  fixed.  If  (SUB(U]^))  is  infeasible,  so  is  (P)  ; 
otherwise  we  generate  U]^  using  a  master  problem  in  a 
convergent  algorithm. 

Let  u'^  =  (U]^'^,U2  )  be  a  composite  dual  solution  at  step 
k  of  the  algorithm.  The  following  lemmas  describe  its 
properties: 

Lemma  3.2:  Let  x^  and  U2  be  respective  optimal  primal 
and  dual  solutions  of  (SUB(Ui)).  Then  u^b  = 
f(ui^,x^)  =  V(SUB(ui)),  and  u^  e    Fq. 
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Proof:    Again,  use  the  identity 

ub  =  cx-u^  (Ax-b]^) -U2  (Nx-b2) -(c-U]^A-U2N)  X. 
So   u^b   =   f (Ui^,x^)-U2^(Nx^-b2) -(c-Ui^A-U2^N)x^. 
Since   U2^,x^   are   optimal   in   (SUB(U2^)),   by- 
complementary  slackness 
U2^(Nx^-b2)  =  (c-Ui^A-U2^N)x^  =  0. 
(SUB(U3_^)  optimal  implies 

(c-U2^N) -Ui^A  >  0,  and  u^^^  <  0,  yielding  u^  e    F^. 
QED 

According  to  Lemma  3.2,  for  any  w-^  <  0,  a  feasible 
solution  u^  =  (U2'^,U2^)  ^  Fq  can  be  constructed  with  value 
V(SUB(U]^)  )  .  Let  u  be  the  best  solution  to  (D)  currently 
known.  The  following  lemma  establishes  the  necessary 
conditions  for  improving  the  incumbent  solution,  u. 

Lemma  3.3;   Let  K  =  { k  |  x'^  e   Fp2  /  x'^  >  0}.    A  necessary 
condition   for  V(SUB(u^3^)   to  yield   a   dual 
solution  value  better  than  the  incumbent  value 
ub  is  that  f(u,x^)  >  ub+e,   k  ^  k,  for  some 
e  >  0. 

Proof:  x^  e  Fp2  is  feasible  in  (SUB(U;l))  ^°^  ^^Y  ^1'  s° 
V(SUB(Ui))  <  f(u,x^).  For  V(SUB(Ui))  >  ub,  u^  must 
satisfy  f(U]^,x^)  >  ub+e  for  k   K  and  e  >  0.   QED 

The  existence  of  a  convergent  algorithm  for  finding  u^^ 
is  established  as  follows. 
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Lemma  3.4:  The  sequence  (SUB(U2^))  with  arguments  {u^^^}  is 
finite  if  {x  e  Fp2 ,  x  >  0}  is  bounded,  V(D)  is 
finite,  and  at  any  step  k,  uj^  satisfies 
f(U]_^,x)  >  ub+e  for  1  =  l,...,(k-l)  for  some 
e  >  0. 

Proof:  SUB(U]^^)  yields  basic  solutions,  x'^,  which  are 
finite  in  number.  The  lemma  follows  if  any 
repeated  basic  solution  yields  termination  of  the 
sequence.  By  lemma  2,  u^b  =  f(U]^^,x^)  and  u^  e  Fp. 
If  x^  =  x-^  for  some  1  <  k,  then  u^b  =  f(U]^^,x^)  = 
f(ui^,x-'-)  >  ub+e  since  u^^  satisfies  f(U]^^,x  )  > 
1ab+e  for  1  =  1  .  .  .  (k-1)  .  Thus  u^^^  is  a  new 
incumbent,  and  with  e  >  0  and  V(D)  finite,  there 
may  only  be  a  finite  number  of  such  updates.   QED 

This  criterion  leads  naturally  to  the  master  problem: 

f(U]^,x-^  )  E  cx-^  -Uq^  (Ax-^-b^^)  >  ub+e,  1  =  l,...,k 
Ui  <  0 

Notice  that  the  objective  function  is  unspecified:  any 
objective  will  suffice.  Experience  shows  that  the  most 
recent  cut  works  well  in  practice.   That  is, 

(MP(x))   max   f(ui,x^) 

St    f(U]^,x-'-)  E  cx-^  -U2^(Ax^  -b^)  >  ub+e, 

1  =  1,  .  .  .  ,  (k-1)  ,   U]^  <  0 
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MP(x)  may  be  unbounded,  in  which  case  any  u^^  <  0  such 

that  f(u,xl)  >  ub+e,  1  =  l,...,(k-l)  will  suffice.   When 

V(MP(x))  <  ub+e,  the  process  terminates  with  an  e-optimal 
solution. 

According  to  the  following  lemma,   a  feasible  primal 

solution  to  (P)  can  be  recovered  whenever  V(MP(x))  is 
finite. 

Lemma  3.5:  Let  w-^  >  0,  1  =  l,...,(k-l)  be  the  optimal 
dual  solution  to  MP(x)  at  iteration  k.  If 
V(MP(x))  is  finite,  a  primal  feasible  solution 
to  (P)  is 


k-1  k-1 

xk  =  [xl<^  +   y   wixl]/[l  +  I      wi] 
1=1  1=1 


Proof:    MP(x)  is  restated  in  canonical  form  as 

max   -U]^  (Ax^-b]^)  +  cx^ 

-ui(bi-Ax^)  <  cx^  -[ub+e],  1  =  l,...,(k-l)   (w^) 

By  duality,  the  optimal  dual  solution,  w,  satisfies 

k-1 
I     -    {hi-kx^)wi   <   b]^-Ax^,  yielding 

k-1  k-1 

A[x^  +  I      WiX^]  <  [1  +   I     w-i]bn 

1=1    ^  -1=1   ^    ^ 

Therefore,   x^    ^Pl*      Also   x^   is   a   convex 
combination  of  x^  t  Fp2 ,  so  x^  ■  Fp.   QED 

58 


Lemma  3.6:   If  V(MP(x))  <  ub+e,  then  x  and  u  are  e-optimal 
primal  and  dual  solutions  to  (P) . 

Proof:    V(MP(x))  =  fCui^'^l^x^)  =  cx^-Ui^"*"!  (Ax^-b^)  (primal) 

=  cx^  +  I   [cx-'--(ub+e)  ]wi         (dual) 
=  [cx^+(ub+e) ] [1  +  I   wi]+(ub+e)] 
Rearranging  yields 

cx^-(ub+e)  =  [f  (ui^'^l,x^)-(ub+  e)]/[l  +  I  w^]    and 
cx^  <  f(ui^+l,x^)  E  V(MP(x)).   If  V(MP(x))  <  ub+e, 
then  cx^  <  ub+e.   If  u*  is  the  optimal  solution  of 
(D) ,  since  x^  e  fp,  and 
ub  <  u*b,  u*b  <  cx^  <  ub+e  <  u*b+e.   QED 

Finally,  we  show  that  once  the  master  problem  becomes 
bounded,  it  remains  that  way. 

Lemma   3.7:   The   value   V(MP(x))   remains   bounded   once 
F(MP(x))  becomes  bounded. 

Proof:  V(MP(x))  is  finite  if  F(MP(x))  is  bounded.  Once 
F(MP(x))  becomes  bounded,  subsequent  iterations  add 
constraints  and  make  existing  constraints  tighter 
by  updating  ub,  further  restricting  the  problem. 
QED 

The  dual  decomposition  algorithm,   DDC,  based  on  the 
preceding  theory  follows. 
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Algorithm  DDC: 

step  0:   Specify  u^'  <0,  e>0,  k=l,  ub  =  -°° 

step  1:   Solve  (SUB(Ui^))  for  x^,  U2^  (i-e.,  f(ui^,x^)  > 
ub+e,x^  e  Fp2'^^  ^  ^) 

If  infeasible,  stop  with  (P)  infeasible 
If   u^b  E  (U]^^,U2^)b   >   ub,    update   incumbent 
solution. 

step  2;   Solve  (MP(x))  for  u^^"*"^ 

<  ub+e,  declare  x^,  u  e-optimal  for  (P) ,  stop 
>  ub+e  and  finite,  x^  e  Fp2 ,  k  =  k+1,  go  to 
step  1 
If  V(MP(x))  \  ^00,  use  any  u^^"^!  <  0,  f(ui^"^l,x^)  >  ub+e, 

k  =  k+1,  go  to  step  1 
is  infeasible,  STOP 

Since  the  master  problem  yields  feasible  primal 
solutions,  it  provides  upper  bounds, 

V(MP(x) )  >  cx^  >  u*b  . 

Retaining  all  cuts  may  become  too  expensive  in  time  or 
space  in  solving  MP(x).  It  is  possible  to  conduct  the 
algorithm  as  a  heuristic  by  retaining  only  a  fixed  number  of 
cuts.  This  may  degrade  the  quality  of  the  solution 
obtainable  at  any  iteration  but  does  not  prohibit 
convergence.  A  discussion  of  cut-dropping  strategy  is 
provided  in  the  chapter  on  computational  experience. 
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The  value  of  e  need  not  be  fixed.  A  large  value  may  be 
used  initially  which  may  be  reduced  as  inconsisten-cies  are 
encountered  in  MP(x) .  Parametric  analysis  may  be  used  to 
find  a  maximum  e,  or  e  may  be  reduced  in  a  fixed  algorithmic 
sequence  of  relaxations  of  cut  aspirations. 

As  a  practical  matter,  the  sequence  of  master  problems 
behaves  much  like  a  first-order  descent  method  in  convex 
nonlinear  programming.  Each  cut  is  a  tangential 
approximation  to  the  objective  function  as  can  be  seen  in 
Figure  3.1.  Just  as  in  nonlinear  programming,  we  can  expect 
oscillation  as  the  solution  sequence  progresses. 


\ 


i  V(u,) 


vK)+€ 


Figure  3.1 


Tangential  Approximation  to  Lagrangean 
Dual  Function 
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In  order  to  deal  with  such  oscillation,  a  local 
neighborhood  (or  trust  region)  may  be  specified  for  ui. 
Initially,  this  neighborhood  is  relaxed  to  encompass  values 
of  U]^  sure  to  contain  the  optimum  (i.e.,  viewing  u^  as  a 
penalty  per  unit  of  violation  of  joint  capacity  constraints, 
we  wish  to  assure  a  feasible  completion) .  Subsequently,  the 
trust  region  can  be  restricted  when  oscillation  is  apparent 
(e.g.,  whenever  V(MP(x))  is  non-monotonic  improving). 

As  an  additional  stabilizing  influence,  we  introduce 
decomposition  goals  for  the  master  problem  variables.  These 
goals  may  be  violated  at  a  small  linear  penalty  cost. 

All  the  essentials  for  convergence  are  preserved  as  long 
as  the  cuts  are  satisfied  hierarchically  before  the  goals. 
In  the  same  vein,  the  trust  region  and  cut  dropping 
heuristics  do  not  present  a  serious  impediment  in  practice. 
Brown,  Graves,  and  Honczarenko  (1983)  developed  similar 
mechanisms  for  primal  goal  decomposition  of  mixed  integer 
models. 

DDC  cuts  can  be  restricted  to  Dantzig-Wolfe  cuts  if  we 
force  a  maximal  solution: 

DDC  cuts    cx^  -  U]^(Ax^-b2^)  >  (ub+e)  ,      1  =  1 ,  .  .  .  ,  K j 


DDW  cuts    max  Uq 

st     cx^  -  Ui(Ax^-b]^)  >  Uq  +  ^1^1/   1  =  1,...,K 
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In  this  sense  the  dual  of  the  Dantzig-Wolfe  master 
problem  is  equivalent  to  the  DDW  master  problem.  Taking  the 
dual  of  (3.1)-(3.4)  yields 


max  u^b]^  +  u 


o 


St   Ui(AX^)  +  Uq  <  (cX^) • 


u^  <  0  . 


Rearranging  and  writing  each  constraint  individually,  the 
result  is  Uq  <  ex-'-  -  U3^(Ax-'-)  for  1  =  l,k.  Adding  Uj^b^  to 
each  side  results  in 

Uq  +  UQ^b]^  <  ex-'-  -  Uq^  (Ax-'--b2^)  for  1  =  l,...,k  . 

Before  updating  DDC  to  include  all  the  above  innovations,  we 
introduce  the  following  notation: 

X   is  the  primal  incumbent, 

X^   is  the  matrix  of  extreme  points  generated  up  to 
iteration  k, 

—  < 

V   is  the  upper  bound, 

e,ef  >  0  are  initial  and  final  convergence  tolerances, 

R,Rf  >  0  are  initial  and  final  trust  regions,  and 

^hold'^f  —  ^'  integer  are  number  of  cuts  held,  and 
total  number  of  cuts  allowed,  and 

Wq   is  exponential  moving  average  of  cut  weights. 
The  updated  algorithm  DDCl  becomes: 
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step  0:  (initialize) 

Specify  e>0,  ef>0,  R>0,  Rf>0,  n^oid  ^  ^ 
Set  ub  =  -°° 

step  1:  (solve  aggregated  problem) 

Solve  min  c^Xi      st  N^^x^  =  b^,  0  <  X]^  <  b]^ 

where  ^a  =    I   b2p 
P 

to  find  U2°  <  0 
Set  initial  decomposition  goals. 

step  2:   Solve  (SUB  (uq^^)  )  for  x^,  U2^ 
Generate  cut 
If  u'^b  >  ub,  update  incumbent  solution. 

step  3:  (optimal  capacity  allocation) 

Set  y^  =  CA(x^) ,  solve  RS(y^)  with  Xy^  optimal 
If  V(y^)  <  V,  update  V,  x 
Generate  cut. 

step  4:   Solve  (MP(x))  for  u^^'''^ 

Update  decomposition  goals 

If  V(MP(x))  <  ub+e,  set  e  =  max(e/2,ef) 

If  e  >  ef,  repeat  step  4 

If  V(MP(x))  not  monotonic,  set  R  =  max(R/2,Rf) 
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step   5:       If   V(MP(x))     /   >   ub+e   and    finite,    x^   e    F 


I 


->oo,  use  u^^"*"!  <  0,  fiuj^'^^fX^)    >   ub+e 


is  infeasible,  STOP 
Set  cut  weights  w^"*"!  =  .  8w^  +  .  2w^ 
If  solving  relaxed  master  problem,  recover  primal 
solution  (i.e.,  Xp^  =  X^w^) 


If  cXq^  <  V,  update  V,  x. 


step  6:  (termination  tests)   k  =  k+1 

If  n  <  nf  and  ub  <  V-e  go  to  step  2. 

Cut  generation  is  performed  as  follows: 

Generate  cut 

n  =  n+1 

if  n  >  nj^Q]^(j  then 

locate  slack  cut  with  minimum  weight  and  replace 
it;  if  no  cut  is  slack,  locate  taut  cut  with 
minimum  weight  and  replace  dt,  also  relax  upper 
bound  from  recoverable  primal  solutions. 

Generate  cut  of  desired  class  (e.g.,  Dantzig-Wolfe, 
DDC,  . .  . )  . 
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B.   A  PENALTY  ALGORITHM  FOR  LINEAR  PROGRAMS  USING 
RESTRICTED  SIMPLICIAL  DECOMPOSITION 

Penalty  functions  have  not  been  seriously  considered  as 
vehicles  for  large-scale  mathematical  programming  in  the 
past  because  they  introduce  nonlinearity  (Geoffrion,  1970) . 
However,  the  recent  advent  of  interior  point  or  logarithmic 
barrier  function  methods  has  shown  that  non-linear 
approaches  to  linear  programming  are  at  least  viable  and 
offer  attractive  convergence  rates  and  complexity  properties 
compared  with  simplex-based  methods  (Karmarkar,  1984;  Gill 
et  al.  1986).  In  this  section  we  develop  the  theory  and 
present  an  algorithm  based  on  penalty  function  concepts 
which  is  well-suited  to  large-scale  optimization 
applications.  The  method  has  strong  parallels  to  Dantzig- 
Wolfe  decomposition,  using  restricted  simplicial 
decomposition  to  produce  a  nonlinear  master  problem  and 
linear  subproblems.  The  process  incorporates  Lagrangean 
lower  bounds  in  a  natural  way,  but  produces  infeasible 
solutions  in  the  master  problem.  We  show  how  to  use 
capacity  allocation/restrictions  on  these  infeasible 
solutions  to  produce  improving  upper  bounds.  The  result  is 
a  convergent  algorithm  which  displays  a  superior  convergence 
rate  in  practice.  To  set  the  stage,  we  begin  with  a 
discussion  of  penalty  function  theory. 

Penalty  algorithms  approximate  constrained  optimization 
problems  by  unconstrained  (or  partially  constrained) 
problems.   This  is  done  by  placing  the  constraints  into  the 
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objective  function  with  a  penalty  parameter  which  exacts  a 
large  price  for  any  violation  of  the  constraints.  This 
relaxed  approximation  to  the  original  problem  becomes  more 
accurate  as  the  penalty  parameter  is  increased.  Thus,  the 
penalty  algorithm  generates  a  sequence  of  infeasible  points 
which  converges  in  the  limit  to  an  optimal  solution  of  the 
original  problem. 

From  the  general  form  found  in  (1.6),  we  introduce  the 
specific  penalty  form  of  a  linear  program, 

(PP)  min  q(h,x)  =  ex  +  P(h,x) 

st   x  e  F 

where  P(h,x)  =  ^  \  |Q(x)  |  |^  I  I  *  I  It  indicates  the  t"^^  norm, 
Q(x)  =  (Ax-bi)"*",  (Ax-b]^)"*"  =  max(0,Ax-bi)  ,  scalar  h  >  0, 
1  <  t  <  2,  and  F  =  {x|Nx  =  b2 , 0  <  Xp  <  b^}.  '^his  is  a 
common  form  found  in  any  nonlinear  programming  text  (see, 
for  instance,  Bazaraa  and  Shetty,  1979,  or  Luenberger, 
1984) .  Recalling  that  J^  is  the  set  of  currently  violated 
constraints,  the  objective  function  may  also  be  written  as 


min  q(h,x)  =  ex  +    J   ^Qj(x)^  .  (3.2) 

JeJv 


Since  the  penalty  terms  are  convex  for  h  >  0,  the 
objective  function  remains  convex,  and  for  t  7^  1,  the 
objective  function  is  dif ferentiable  everywhere.  Convexity 
is  proved  for  the  quadratic  penalty  function  in  Appendix 
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A:   convexity  arguments  for  other  forms  may  be  found  in 
Bertsekas  (1982b)  and  Luenberger  (1984),  for  example. 

Due  to  the  convexity  of  q(h,x),  we  may  employ  the 
standard  mathematical  result  that  x  is  optimal  in  q(h,x)  if 
and  only  if  first-order  stationary  conditions  are  met: 

Vq(h,x) (x-x)   >   0  (3.3) 

for  all  X  €  F.    For  Vq(h,x)(x-x)  <  0,  (x-x)  represents  an 
improving  direction. 

The  following  penalty  function  characteristics,  stated 
in  terms  of  PP,  are  taken  from  Luenberger  (1984) .  Let  {h^} 
and  (x'^}  be  sequences  of  penalty  parameters  and  associated 
optimal  solutions,  with  h^"*"^  >  h^,  and  h°  >  0 .   Then 

q(hJ^,x^)   <   q(hJ^+l,xJ^+l) 
P(x^)   >   P(x^+1) 
cx^  <   cx^"*"^ 


and 


ex*   >   q(h^,x^)   >   cx^  , 


.* 


where  x  is  optimal  in  MCTP.  Thus,  solving  the  penalty 
problem  for  any  h  =  h^  >  0  produces  a  lower  bound  on  MCTP 
and 

lim  q(h^,x^)   =   ex*   with   x^   c   x*  . 
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This  large-scale  nonlinear  version  of  the  original 
linear  problem  is  useless  without  an  effective  means  of 
solution.  We  base  our  solution  algorithm  on  the  promising 
method  of  restricted  simplicial  decomposition. 

Restricted  simplicial  decomposition  (RSD)  was  developed 
by  Hearn  et  al.  (1984)  to  solve  large-scale  pseudo-convex 
nonlinear  optimization  problems.  The  method  decomposes  the 
original  problem  into  a  nonlinear  master  problem  and  a  set 
of  linear  subproblems  (assuming  linear  constraints) .  At 
each  iteration,  the  subproblems  generate  a  new  extreme  point 
of  the  feasible  region  for  the  master  problem,  and  the 
master  problem  minimizes  the  original  objective  function 
over  a  simplex  of  retained  extreme  points.  The  master 
problem  in  turn  provides  new  cost  information  to  the 
subproblems  via  Vf(x)  where  f  is  the  objective  function  of 
the  master  problem  and  x  is  the  current  solution.  The 
process  terminates  when  no  favorable  extreme  points  remain. 
It  is  termed  "restricted"  since  only  a  fixed  number  r  of 
extreme  points  are  retained  at  each  iteration. 

When  r  =  1,  RSD  specializes  to  the  algorithm  of  Frank 
and  Wolfe,  1956.  In  this  case  the  master  problem  becomes  a 
minimization  on  the  line  joining  the  current  point  x  and  the 
extreme  point  just  generated.  Many  researchers  have  noted 
the  slow  convergence  of  the  Frank-Wolfe  algorithm  (Ali,  et 
al.,  1978;  Meyer,  1974;  Wolfe,  1970);  a  simple  example  taken 
from  Wolfe  (1970)  makes  this  evident.   Suppose  we  are  trying 
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to  find  the  point  of  a  polytope  which  is  closest  to  the 
origin:  in  Figure  3.1,  this  is  the  midpoint  of  the  base  of 
the  triangle.  At  each  iteration,  linearizing  the  objective 
function  and  solving  the  resulting  LP  leads  to  vertex  B  or  C 
of  the  triangle.  Thus,  as  the  algorithm  progresses,  the 
line  search  proceeds  along  a  direction  nearly  orthogonal  to 
the  gradient  Vf(x'^)  causing  zigzagging.  However,  simply  by 
retaining  two  extreme  points  and  optimizing  over  the  simplex 
formed  by  the  extreme  points  and  the  current  iterate,  the 
problem  in  Figure  3.2  is  solved  optimally  on  the  second 
iteration.  Indeed,  Hearn's  computational  results  show 
significant  improvement  as  r  is  increased  from  1  (Hearn,  et 
al.,  1984). 


B 


Figure  3.2   Frank-Wolfe  Zigzagging  Example 
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Using  this  method  on  PP  decomposes  the  problem  into  a 
master  problem,  which  minimizes  q(h,x)  over  the  set  a 
retained  extreme  points,  and  a  subproblem  consisting  of  a 
set  of  decoupled  network  flow  problems  with  modified  prices. 
The  master  problem  is  simple  to  solve  since  it  has  only  a 
convexity  constraint  like  (3.3)  and  nonnegativity 
constraints  on  a  number  of  variables  only  equal  to  the 
number  of  retained  extreme  points.  Furthermore,  Hearn  shows 
that  the  method  is  convergent  even  if  the  master  problem  is 
only  solved  approximately. 

The  decomposition  of  the  penalized  MCTP  is  formed  in  the 
following  manner.  Let  X  be  a  matrix  whose  columns  are 
retained  extreme  points  of  F,  and  r  be  the  number  of  points 
retained.   Then  the  master  problem  becomes 

(MP)  min  q(h^,X^w)  =  cX^w  +  P(h^,X^w) 

st:       l*w  =  1 

w  >  0  . 

This  resembles  the  Dantzig-Wolfe  master  problem,  except  that 

the  penalized  joint  capacity  constraints  now  appear  in  the 

r 

objective  function.   Let  x^  =  ),  ^n^^n  ~  ^^^  ^®  "^^^  optimal 

n=l 

solution  to  MP  at  iteration  k,   and  Jy  be  the  set  of 
constraints  violated  at  x^.   The  corresponding  subproblem  is 

(SP)  min   q(h^,x^)x  (3.4) 

st     X  e  F  (3.5) 
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where 

vq(h^,x^)  =  c+VP(h^,x^)  =  c+h^Q  (x^)  Vq(x^)  =  c+h^(Ax^-b)  "^A  . 

Comparing  this  form  to  the  objective  of  the  Dantzig-Wolfe 
subproblem  (3.5)  reveals  that  -h'^CAx'^-b^)  "^  is  an  estimate 
for  the  dual  multipliers,  u^^.  Thus,  we  make  a  new  estimate 
of  the  optimal  multipliers  based  on  the  violations  in  the 
current  master  problem.  To  avoid  notational  confusion,  we 
let  u^^  =  h^(Ax^-b-^)  ■*"  in  the  rest  of  this  section.  Now,  if 
x^  solves  SP  and 

Vq(h^,x^) (x^-x^)   >   0  , 

then  x^  solves  PP  for  h  =  h^;  otherwise  (x^-x^)  is  a 
favorable  direction,  so  we  add  x^  to  the  retained  extreme 
points  and  return  to  the  master  problem. 

If  x'^  is  optimal  in  PP  for  h  =  h^  and  x*  is  optimal  in 
MCTP,  then 

q(h^,x^)   <   ex*  ,  providing  a  lower  bound  on  MCTP. 

However,  since  x'^  is  probably  not  an  extreme  point  of  F,  we 
must  identify  the  facet  of  F  containing  x^  and  solve  MP 
exactly  to  establish  this  bound.  We  use  other  information 
to  provide  intermediate  bounds. 

Recall  that  the  Lagrangean  relaxation  of  the  linear 
program,  with  u^^  fixed,  may  be  written 
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LR(U]^)         min  (c  -  U]^A)x  +  u^^b^ 

St    X  e  F 
Using  the  set  Jy,  associated  with  any  solution  to  the  master 
problem,  we  let 


/   0   if   j  /  Jy  (3.9) 

(   h(A-;X-bi-i)t-l   if   j  £  Jv  , 


and  rewrite  SP  as 

min   (c  +  U]^A)x 
st   X  £  F  . 

Thus,  we  observe  that 

V(ui)   =   Vq(h^,x^)x^  -  u^bi  ,  (3.10) 

so  at  each  iteration  we  compute  UQ^b^^  as  we  compute  new 
subproblem  costs  and  adjust  the  global  lower  bound  whenever 
(3.10)  exceeds  the  current  lower  bound. 

Since  the  penalized  problem  is  convex,  it  is  also 
possible  to  linearize  the  objective  function  at  x^  to 
establish  lower  bounds  via 

q(h^,x^)  +  Vq(h^,x^) (x-x^)  <  ex*  . 

However,  the  following  lemma  establishes  that  the 
linearization  is  always  dominated  by  the  value  of  the 
Lagrangean  relaxation  with  u^^  =  h(  (Ax^-b^) '^)  ^~'^.   For  ease 
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of  presentation  here,  we  use  the  non-standard  vector  form 
[  (Ax^-b ]_)"•"]  ,  meaning  [max(0,  AjX^-b^^j  )  ]  ^  for  each  element  in 
the  vector,  and  we  drop  k  from  x^  and  U;|l^* 

Lemma  3.8:   Let  u^  =  h(  (Ax-b]^)  "^j"^"!  at  point  x  e  F.   Then 

/\  /\  /\ 

q(h,x)   +  Vq(h,x) (x^-x)  <  V(ui)   <  ex*,  where 
x^  =  argmin  (Vq(h,x^)x  st  x  e  F} 

Proof:  For  u^  =  h{  (Ax-b^)  "*"]^"-'-,  vq(h,x)  =  (c+u^A)  .  Now 
q(h,x)  +  V  q(h,x)  (x^-x)  =  ex  +  P(h,x)  +  cx'^  + 
VP(h,x)x^  -  ex  -  P(h,x)x  =  cx^  +  P(h,x)  + 
Vp(h,x) (x^-x)  =  (c+uiA)x^  +  P(h,x)  -  VP(h,x)x  since 
VP(h,x)  =  h[ (Ax-bi)+]t"lA  =  u^A.  By  definition, 
VCu-"-)  =  (c+UiA)x^  -  u^b^  <  ex  ,  so  our  result  is 
proved  if  VP(h,x)x  -  P(h,x)  >  u^b]^.  Now  VP(h,x)x 
-  P(h,x)  =  u^Ax  -  ^h([ (Ax-b)+]t-l)  •  (Ax-b)+  =  u^Ax- 
z-  u^  (Ax-b]^)  "•".  But,  since  u^j  =  0  whenever  AjX-b^^j 
<  0,  we  see  that 


/\  /\ 


1  ."^  />r.  i_  x+  _  _/t-l,.'^  ,r.  .  1: 


/\  /N 


u^Ax  -  -u^CAx-bi)"^  =  n(— — )uiAx  +  ^r^ib]^  . 

Since  u^^AjX  >  u^^jbj  for  all  j,  then  (— ^)U2^Ax  + 
u^b]^  >  u^b^.   QED 

The  (relaxed)  penalty  form  of  MCTP  produces  a  sequence 
of  infeasible  x's  which  is  only  guaranteed  to  converge  to  x* 
in  the  limit.  Therefore,  the  penalty  algorithm  does  not 
naturally  produce  intermediate  feasible  solutions  or  upper 
bounds.   In  general,  this  is  unacceptable,  so  we  rely  on  the 
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improving  nature  of  the  sequence  of  x's,  using  projection 
and  restriction  to  generate  intermediate  feasible  solutions 
and  upper  bounds  for  MCTP. 

Suppose  h  is  fixed  and  x  solves  min  q(h,x)  for  x  e  F. 
Then,  if  x^  solves  the  master  problem  at  iteration  k,  either 
x^  =  X  or  a  favorable  extreme  point  is  added  to  the  master 
problem  with 

q(h,x^+l)   <   q(h,x^)  . 

If  x-*^  =  X,  increasing  h  to  h'  >  h  and  resolving  MP  over  the 
same  extreme  points  yields 

q(h,x^)   <   q(h',x^+l)  ' 

By  sampling  the  sequence  periodically  as  x'^  ^  x  and  forming 
a  capacity  allocation  according  to  (2.11),  y'^  =  CA(x^)  (with 
scaling  to  allow  non-integer  solutions) ,  we  will  form 
allocations  in  RS(y)  which  improve  as  x  improves,  thus 
obtaining  improving  feasible  solutions  and,  therefore,  upper 
bounds.  These  resource  allocations  may  be  performed 
whenever  desired;  we  choose  to  do  an  allocation  every  r^^ 
iteration  where  r  is  the  number  of  retained  extreme  points. 

The  following  lemma  shows  that  the  allocations  must 
eventually  converge  to  an  optimal  allocation. 

Lemma  3.9.   Let  (x^)  be  a  sequence  solving  MP  as  (h^)  ->-  oo, 


yk  =  cA(x^)  ,  and  Xy^  solve  RS(y^).   Assumi 


ng 
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the  solutions  are  scaled  to  provide  integer 
allocations,  as  h^ ->  oo  ,  V(y^)  ^  V*  and 
Xy^  -^  X*. 

Proof:  lim  x^  =  x*  for  the  penalty  problem,  where  ex*  is 
optimal  in  MCTP  (Luenberger,  1984,  Chapter  12). 
Since  y^  =  CA(x^)  ,  as  h  ^  °°  ,  y*  =  lim  y^  ^  CA(x*). 
Since  x*  is  feasible,  yp j *  >  Xp j *  for  all  p  and  j. 

[y*    <    ex*.       But   Xy 

cxy*  =  cx*,  and  Xy*  solves  MCTP.   QED 


Thus  V(y*)  =  cx^*  <  cx*.   But  Xy*  is  feasible  in 
MCTP  by  construction,  so  cx^*  >  cx*.    Therefore 


The  algorithm  proceeds  by  iteratively  solving  the 
subproblems  and  master  problem,  and  periodically  solving  the 
restriction  to  generate  new  feasible  solutions.  The  value 
of  h  is  increased  whenever 

1)  Vq(h,x^) (x^-x^)   >   0, 

2)  RS(y)  will  be  solved  in  the  current  iteration,  or 

3)  min  q(h,x^)   <   V. 

The  algorithm  terminates  when  a  feasible  solution  is 
generated  in  the  master  problem,  since  it  must  be  optimal, 
or  when 

(V  -  V)/V  <  e 

for  some  e  >  0 . 

The  algorithm  uses  the  following  additional  notation: 
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X^   =  matrix  of  retained  extreme  points  at  iteration 

x^   =  current  master  problem  solution 

Xj^   =  previous  MP  solution  considered  in  current  MP 

X^   =   a  matrix  formed  by  augmenting  X^  with  Xj^,  written 
X^  u  Xm 


conv(X)  =  convex  hull  of  X 


.th 


Jy   =   set  of  capacities  violated  by  the  k*-^  solution  to 
MP 

a    >   1.0,  a  scalar  multiplier  for  increasing  h 

X    =   current  incumbent  for  MCTP 

t    =  power  of  the  penalty  function,  1  <  t  <  2 

r    >   1,  integer;  maximum  number  of  retained  extreme 
points 

e    >   0,  a  stopping  parameter 

CA(x)   =   a  capacity  allocation  based  on  x  using  equation 

(2.11) 
V,V  =   current  upper  and  lower  bounds  on  MCTP. 

The  Algorithm  RSD(P) : 

Input:  The  network  T  =  {I, J},  and  joint  capacity  vector,  b^ 
and,  for  products  p=l,...,|P|,  a  cost  vector,  Cp, 
and  supply/demand  vector,  b2 

Output:  Incumbent  solution  x,  and  incumbent  value  ex. 

step  0  (Initialize) :   Select  h°  >  0,  e  >  0,  a  >  1,  r  >  1, 
set  k  =  0 

Solve  LR(0),  with  x°  optimal;  set  V  =  V(0) 
Form  Jv°  =  (JIAjxO  -  b^j  >  0} 
If  Jy°  =  ^,  stop  with  x°  optimal 
Else,  set  y  =  CA(x°) 
Solve  RS(y),  with  Xy°  optimal;  set  V  =  V(y) . 
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If  (V  -  V)/V  <   e  stop  with  incumbent  x  =  Xy° 
Else,  set  X°  =   ^,    Xj^  =  x° 

step  1  (Solve  Subproblem) 

Let  x^   =  arginin{V  q(h^,x^)x|  X  e  F} 

where 

Vq(h^,x^)  =  (c+UiA)  for  Ui  =  h( (Ax^-b^) +) ^"1 
If   q(h^,x^) (x^-x^)  >  0,  x^  solves  MCTPP  for  h  =  h^ 

set  V  =  q(h^,x^) 

If  (V  -  V)/V  <  e,  exit  with  incumbent  x 

Else  h^  =  a-h^ 

Go  to  step  2 . 
Else  compute  Y(ui)  =  Vq(h^,x^)x^   -  u^b^ 

If  v(ui)  >  Y,  V  =  v(ui) 

If  (V  -  V)/V  <  e,  exit  with  incumbent  x 

Else 

i)   If  |X^|  <  r,  X^+l  =  X^  u  x^ 

ii)   If  ix'^l  =  r,  drop  the  column  of  X^  which  had 
the  smallest  Wj^  in  convex  combination  forming 
x^  and  replace  it  with  x^  to  form  X^"''-'- 

Xm  =  x^ 
Set  x'^+l  =  X^+1  ^  Xm 

Step  2  (Solve  Master  Problem) 

x^"*"!  =  argmin  {q(h'^,x),  x    conv(X^)} 
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Discard  all  columns  of  X^"*"^  with  w^  =  0 
Form  the  set  Jy^"^"^  =  (j|AjX^+l  -  b^j  >  0} 

If  k  is   an   integer   multiple   of  r,       do   a   resource 
allocation: 

Set  y  =  CACx^"*"!) 

Solve  RS(y),  with  Xy^''"^  optimal 
If  V(y)  <  V,  set  V  =  V(y),  x  =  Xy^+l 
If  (V  -  V)/V  <  e,  exit  with  incumbent  x 
Else  k  =  k+1 
Go  to  step  1. 

One  possible  difficulty  with  the  penalty  form  of  the 
objective  function  is  that  only  the  prices  associated  with 
the  currently  violated  set  of  joint  capacity  constraints  are 
adjusted  at  each  iteration.  There  is  no  intermediate 
adjustment  of  multiplier  values  for  constraints  previously 
but  not  currently  violated.  Furthermore,  convergence  to 
optimal  multiplier  values  for  MCTP  is  guaranteed  only  in  the 
limit,  that  is,  as  h  -^  °°.  These  potential  problems  are 
overcome  by  the  augmented  Lagrangean  form  of  MCTP.  As  the 
name  suggests,  the  objective  function  is  formed  by  adding  a 
penalty  term  to  the  standard  Lagrangean  dual  form  (Hestenes, 
1975) .  Since  the  theory  of  this  augmentation  is  developed 
in  terms  of  equality  constraints,  we  express  the  joint 
capacitation  constraints  of  MCTP  as 

(AjX-b^j)  +  mj^  =  0    for   j  e  J  . 
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The  augmented  Lagrangean  problem  is  then 

min  L(x,U]L/h) 
m,  X 

=   ex   +   I       [uij{ (Ajx-bij)+mj2}+lh|AjX-bij+mj2|2]^       x   c    F 
jej 

with  U]^,    h   >    0. 

Bertsekas  (1982b)  demonstrates  that  an  equivalent 
objective  not  requiring  the  variables  m  is 

L(x,ui,h)  =  ex  +      I       { [max(0,Uij+h(AjX-bij) ) ]2-uij2}  . 

jeJ 

This  amounts  to  a  multiplier  update 

ui^'^'l  =  max(0,ii^  +h(AjX-bij)}     for  all   j  =  J  . 

Using  vector  notation,  the  problem  is  stated  as 

min   L(x,U3^,h)  =  cx  +  P(x,U3^,h) 
st    X    F 

where   P(x,Ui,h)  =  2h(l  l^l' I  1^  "  I  l^ll  1^) 
and      u^'  =  [ui   +   h  (Ax-b^^)  ]"*"  . 

By  analogy  to  the  penalty  version  of  the  algorithm  it  is 
easy  to  see  that  the  subproblem,  min  VL(x,U]^,h)x  for  x  >  F 
may  be  restated  as  min  (c+u^^A)  x,  st.  x  £  F,  and  a  lower  bound 
on  MCTP  is  derived  by  computing  min  Vl(x^,U]^  ,h)  x^-u^^bj^ . 

Bertsekas  points  out  certain  advantages  to  solving  this 
augmented  Lagrangean  form  of  the  problem.  First,  the  method 
will  converge  to  an  optimal  solution  for  a  finite  value 
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of  h.  Second,  the  rate  of  convergence  of  this  augmented 
foirm  is  superior  to  the  penalty  form,  which  depends  on  the 
rate  of  increase  of  h,  thus  substituting  convergence  rate 
for  ill-conditioning  concerns  in  the  master  problem. 

We  conclude  this  section  by  demonstrating  that  RSD(P)  is 
a  convergent  algorithm.  Hearn,  Lawphongpanich  and  Ventura 
(1985)  present  a  proof  that  RSD  either  terminates  in  a 
finite  number  of  iterations  or  generates  a  sequence  (x^} 
for  which  any  subsequent ial  limit  is  a  solution.  For  the 
penalized  version  of  the  MCTP  the  proof  may  be  simplified, 
relying  on  the  simple  assumption  of  a  bounded  feasible 
region  (i.e.,  0<Xp<b]^<°°. 

In  preparation  for  the  main  result,  we  first  show  that 
solving  the  master  problem  for  fixed  h  produces  a  sequence 
of  improving  solutions  (Hearn,  et  al,  1985). 

Lemma  3.10:   Let  x^  solve  the  restricted  master  problem  at 

iteration  k,  with  h  =  h.    If  x-*^  is  not 
optimal  in  PP,  then  q(h  ,x^''"^)  <  q(h  ,x^)  . 

Proof:  At  iteration  k,  x^  is  the  current  iterate  and  x^ 
solves  the  subproblem  with  \^(h,x^)  (x^-x^)  <  0. 
Thus  x^  e  X^+l  and  X^"'"^  =  X^"*"!  u  x^.  Now  let 
x^"*"^  =  argmin{q(h,x)  s.t.  x  e  conv(X^'''^)  } .  Then, 
q(h,x^''"l)  <  q(h,x^)  since  x^  e  X^"^^.  Now,  if 
q(h,x^"'"-^)  =  q(h,x^),   then  x^  also  minimizes  the 
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master  problem  at  iteration  k+1  and  7q(h, x^) (x-x^) 
>   0  for  all  X  e    X^"*"^.   But  since  x^  e  X^'^^ ,    this 
contradicts   the  assumption  that   (x^-x^)   is   an 
improving  direction.   QED 

The  main  convergence  result  is  stated  as  follows: 

Lemma  3.11:   Let  F  be  a  bounded,  non-empty  set  of  feasible 

single  commodity  network  flows.  The  RSD(P) 
algorithm  either  solves  PP  for  h  =  h  in  a 
finite  number  of  iterations  or  converges  to 
an  optimal  solution  x  as  the  limit  of  an 
infinite  sequence.  Furthermore,  as  h  is 
increased  to  °o  ,  q(h,x)  ->  ex*  and  (x)  ^  x*, 
where  x*  is  an  optimal  solution  to  MCTP. 

Proof:  Let  x  solve  min  q(h,x)  st  x  :  F.  Since  F  is  non- 
empty and  bounded,  choose  any  starting  point 
x°  .;  F,  giving  q(h,x)  <  q(h,x°)  <  °°.  Then  at  any 
iteration  either  x^  =  x  or 

q(h,x)  <  q(h,xk-»-l)  <  q(h,xl^)  <  q(h,xO)  . 

—  /\  —  ^\ 

Letting  d^  =  q(h,x^)  -  q(h,x^''"^),  then   y   d^  = 

k=0 

q(h,x°)  -  q(h,x),  so  lim  d^  =  0.   Since  only  x 

varies,  lim  Ix^-x^"*"-^!  =  0  and,  since  lim  q(h,x'^)  = 
k-»-oo  k->-o° 

~  ~~         ^v 
q(h,x),  then  (x^)  -*  x. 

Since  x    F,  we  set  h  to  h  >  h  and  continue 

the  algorithm.   By  Lemmas  1  and  2,  Section  12.1  of 
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Luenberger  (1984),  q(h,x)  <  q(h,x)  <  ex*.  Finally 
by  Section  12.1,  Theorem  1  of  Luenberger  (1984) 
letting  h  ^  °°  ,  we  have  lim  q(h,x)  =  ex*  and 
(X}  ^  X*.   QED 


It  is  not  necessary  to  solve  q(h,x)  to  optimality  for 
each  value  of  h  to  obtain  convergence  since  all  extreme 
points  of  the  current  simplex  are  elements  of  F.  Thus, 
whenever  it  is  desirable,  we  may  set  a  new  h  >  h  and  resolve 
the  master  problem  with  new  objective  function  min  q(h,x) 
over  the  same  simplex  to  obtain  a  new  x^.  The  subproblem 
costs  are  then  based  on  Vq(h,x^) .  Appropriate  opportunities 
to  increase  h  have  already  been  discussed. 

The  preceding  proof  demonstrates  that  the  algorithm  is 
theoretically  convergent  without  resource  directive  capacity 
allocations.  Indeed,  Bertsekas  has  shown  that,  for  proper 
choice  of  sequence  (h),  penalty  algorithms  may  be 
quadratically  convergent  (1982b) .  However,  in  practice, 
awaiting  convergence  for  large  problems  may  be  impractical. 
Therefore,  we  use  the  resource  allocation  steps  to  help 
obtain  near-optimal ity  quickly.  As  with  the  resource- 
directive  approach  discussed  in  Chapter  II,  we  always  employ 
simple  reallocation  to  obtain  the  best  available  solutions. 
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IV.   COMPUTATIONAL  EXPERIENCE 

The  algorithms  presented  in  Chapters  II  and  III  have 
been  coded  in  FORTRAN  and  tested  on  various  versions  of  a 
large  scale  MCTP  using  an  IBM  3  03  3AP  under  both  VM  and  MVS 
operating  systems.  For  comparison,  some  four  and  ten 
commodity  problems  have  been  solved  using  the  X-system  of 
Brown  and  Graves  (1974),  which  exploits  the  generalized 
upper  bound  structure  of  the  complicating  constraints  in  the 
MCTP  and  employs  advanced  starting  solutions  of  the  network 
constraints.  These  control  problems  were  solved  using  an 
IBM  3081K,  which  is  approximately  twice  as  fast  as  the 
3033AP  for  the  X-system. 

Throughout  this  chapter  the  following  acronyms  are  used: 
DDC    =  dual  decomposition, 

RDLB    =  resource  direction  with  Lagrangean  lower  bounds, 
RSD    =  restricted  simplicial  decomposition, 
RSD(A)  =  RSD  of  the  augmented  Lagrangean  form,  and 
RSD(P)  =  RSD  of  the  penalty  form. 

The  particular  test  problems  used  are  described  in  Sec- 
tion A,  issues  concerning  individual  procedures  are  studied 
in  Section  B,  and  comparisons  among  algorithms  are  presented 
in  Section  C.  Whenever  the  word  "gap"  is  used  to  describe 
the  quality  of  a  solution,  it  means  (V-V)/V*,  where  V,  V, 
and  V*  are  the  upper  and  lower  bounds,  and  optimal  solution. 
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A.   TEST  PROBLEMS 

We  demonstrate  our  methods  on  a  transshipment  model  for 
delivery  over  time  of  military  products  from  production  and 
storage  locations  to  overseas  locations  to  support  theatre 
operations.  The  model  covers  five  physical  echelons, 
including  production  plants,  storage  depots,  ports  of 
embarkation,  ports  of  debarkation,  and  geographic  field 
locations.  Road,  rail,  sea,  and  air  transportation  are 
modelled,  and  product  demands  are  time-phased.  Capacitation 
occurs  primarily  on  sea  and  air  links,  and  as  throughput 
capacities  on  transfer  points,  requiring  replication  of  some 
echelons. 

The  objective  of  the  model  is  to  minimize  deviation  from 
on-time  deliveries.  This  is  accomplished  through  a 
specified  set  of  backlogging  arcs  with  graduated  penalties 
and  a  system  of  penalized  "artificial"  arcs  which  direct 
flow  around  the  network  to  satisfy  "undeliverable"  demands 
and  to  equilibrate  supply  and  demand.  The  products  all  use 
a  common  unit  of  flow  and  incur  a  common  cost  on  each  arc. 

The  network  is  abstracted  in  Figure  4 . 1 — a  more  detailed 
description  of  the  model  may  be  found  in  Staniec  (1984) . 
Computational  tests  use  an  underlying  network  of  3,300  nodes 
and  10,4  00  arcs,  of  which  1,071  are  subject  to  non-redundant 
joint  capacity  constraints.  A  set  of  basic  test  problems  of 
between  four  and  ten  products  is  used  for  detailed  inter- 
and   intra-method  testing.     Results  of  these  tests  are 
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reported  together  in  the  following  sections.  Three 
competitive  methods  are  tested  on  a  final  100  commodity  test 
problem  in  the  last  section  to  illustrate  the  effectiveness 
of  these  methods  on  a  typical,  large-scale  problem. 


AMMO 
PLANTS 


DEPOT  DEPOT 

INIT.,  INV. 


poe 


POC 


POO 


POO  GLOC  GLOC' 


PRODUCTION 


Figure  4.1   Simplified  Ammunition  Distribution  Model 

The  scope  of  our  computational  study  is  limited  to 
problems  drawn  from  this  one  generic  model,  but  we  think 
that  these  problems  are  generally  quite  difficult  to  solve. 
Because  of  the  explicit  system  of  penalized  artificials,  the 
costs  range  over  five  orders  of  magnitude,  which  have  the 
potential  to  cause  difficulty  in  real  arithmetic.    Also, 
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because  the  model  has  multiple  shipping  modes,  covers  nine 
spatial  echelons,  21  time  echelons,  and  has  a  large 
backlogging  system,  repair  of  joint  capacity  violations 
requires  extensive  effort.  Therefore,  these  test  problems 
present  a  rigorous  challenge  for  the  solution  algorithms. 

We  have  graded  the  basic  test  problems  used  as  easy  (E) 
or  hard  (H)  based  on  two  criteria.  The  first  criterion  is 
the  number  of  violations  encountered  in  the  first 
relaxation.  This  parallels  the  criterion  of  Ali,  et  al. 
(1984)  in  evaluating  direct  simplex  methods  used  on  some 
medium-sized  MCTPs:  the  number  of  tight  joint  capacity 
constraints.  The  second  criterion  is  the  magnitude  of  the 
initial  violations,  which  is  an  indirect  measure  of  the  flow 
changes  required  to  reach  the  optimal  solution.  Together 
these  criteria  affect  the  size  of  the  gap  between  the 
initial  finite  upper  and  lower  bounds.  This  is  evident  in 
the  RSD  and  RDLB  procedures,  where  the  initial  feasible 
solutions  are  based  upon  allocations  from  the  first 
relaxation:  the  larger  the  violations,  the  poorer  the 
initial  allocations.  Among  the  problems  we  test,  the  most 
difficult  is  lOH,  on  which  RSD  and  RDLB  generate  a  56% 
initial  gap. 

Test  problem  specifications  and  direct  solution  times  by 
the  X-system  are  presented  in  Table  4.1.  Initial  gaps  are 
from  the  RSD  algorithm. 
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TABLE  4 . 1 


OPTIMAL  SOLUTIONS  FOR  BASIC  TEST  PROBLEMS 


PROD.   INIT.   OPTIMAL  SOL'N 
GRADE    GAP 

CU 


INIT, 


LARGEST 


X-SYSTEM 


VIOL.   VIOL./     TIME  (SEC*) 
ARC  CAP.   COLD     HOT 


4E 

1 

lOE 

37 

4H 

14 

6H 

35 

8H 

45 

lOH 

54 

( 

*IBM  3081K) 

340,916,981 
367,135,103 
130,739,585 
138,525,451 
143,526,951 
169,532,339 


3      844/1458    1021 
11   4,998/10,500   2586 

9   7,185/10,500   461.7  496.5 

13  10,048/10,500  3015.7  581.0 

15  13,861/10,500  2431.3  1067.6 

20  13,861/10,500  5352  1393.5 


The  4-cominodity  problems  have  approximately  13,200 
constraints  and  41,600  variables:  the  10-commodity  problems 
have  about  33,000  constraints  and  104,000  variables.  Hot 
start  solutions  use  a  pure  network  basis  crash,  and  all 
these  solutions  factor  the  joint  capacitation  constraints  as 
generalized  upper  bounds.  However,  pure  network 
factorization  is  not  employed. 

The  principal  motive  for  these  runs  is  to  establish 
completely  optimal  yardsticks  for  the  competing  indirect 
methods. 
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B.   INTRA- ALGORITHM  STUDIES 

We  first  discuss  the  selection  of  parameter  settings  for 
the  RSD  algorithms,  and  then  we  discuss  issues  involving 
DDC,  including  obtaining  bounded  master  problem  solutions 
and  cut  dropping.  For  RSD(P) ,  the  parameters  of  interest 
are  the  initial  value  of  h,  the  h  multiplier  a,  and  the 
penalty  exponent,  t.  For  RSD(A) ,  we  examine  h  and  a,  fixing 
the  exponent  at  2.0.  In  our  tests,  initial  values  for  h  are 
taken  as  a  fraction  of  the  largest  cost  used  in  the  network, 
in  this  case  a  bypass  penalty  cost  of  62,700.  Values  tested 
in  RSD(P)  were  627.0,  62.7,  and  6.27.  Other  researchers 
(e.g.,  Bertsekas,  1982)  suggest  penalty  multipliers  in  the 
range  of  4  to  10  when  solving  the  problem  optimally  for  each 
value  of  h.  However,  since  we  increase  h  frequently  based 
on  periodic  reallocations  and  lower  bound  adjustments, 
penalty  multiplier  values  between  1.5  and  4  are 
investigated.  Finally,  to  represent  the  range  of  possible 
exponents  we  test  the  algorithm  for  t  =  1.1,  1.5  and  2.  The 
exponent  t  =  1.0  is  not  useful  in  this  algorithm  since  it 
results  in  U]^j  =  h  for  all  j  in  Jy  at  every  iteration. 

Table  4.2  summarizes  response  to  changes  in  the  size  of 
the  penalty  multiplier  in  RSD(P) .  The  results  indicate  the 
upper  and  lower  bounds  and  final  gap  based  on  21  iterations 
of  the  algorithm  for  problems  4H  and  lOH.  A  multiplier  of 
1.5  seems  to  work  best  in  the   smaller  problem,   while   4.0 
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TABLE  4.2 

RESPONSE  TO  CHANGING  PENALTY  MULTIPLIER 

PROD.      MULT.      INIT      EXP  LB  UB  GAP 
a h t  (  10  )  f  10  ) (%) 

4H        2.0        6.27      2.0  130.4  130.74  .273 

1.5         6.27      2.0  130.67  130.75  .061 

2.0         6.27       1.5  129.7  130.74  .78 

1.5         6.27       1.5  130.2  130.76  .44 

lOH       iTo         6.27      iTo  162  170.1  4.79 

2.0         6.27      2.0  163.4  170.6  4.22 

4.0         6.27       1.5  150.2  170.1  11.68 

2.0         6.27       1.5  160.6  170.8  5.99 

4.0       62.7       2.0  156.2  172  9.17 

2.0       62.7       2.0  153.9  171.2  10.09 

4.0       62.7       1.5  163.9  170.6  3.93 

2.0       62.7       1.5  162.1  170.5  4.91 


generally  works  best  in  the  larger  problem.  However,  in 
both  cases,  the  response  is  not  substantially  worse  for  a 
multiplier  of  2.0.  Therefore,  we  fix  the  multiplier  at  2.0 
and  concentrate  further  studies  on  the  other  two  parameters. 
Recalling  that  in  the  subproblems  the  gradient  of  the 
penalty  term  provides  an  estimate  of  the  optimal  Lagrange 
multipliers, 

ui^   =   h[  (Ax^-bi)"*"]t"l  , 

it  is  evident  that  h  and  t  work  in  concert  to  affect  the 
magnitude  of  the  multiplier  estimates  at  each  iteration.  We 
want  a  combination  which  produces  multiplier  estimates  large 
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enough  to  generate  new  extreme  points,  yet  small  enough  to 
allow  good  lower  bound  estimates.  That  is,  as  u^^^b^  grows 
large,  YCU]^^)  degrades  as  a  lower  bound  estimate.  Table  4.3 
and  Figure  4.2  show  the  response  of  RSD(P)  to  changes  in  h 
and  t.  Results  are  for  21  iterations  of  the  algorithm,  with 
the  penalty  multiplier  fixed  at  2.0  for  all  cases.  In  each 
case  we  retain  a  maximum  of  seven  extreme  points  plus  the 
current  master  problem  solution,  and  perform  a  primal 
resource  allocation  every  seventh  iteration. 

We  summarize  the  computational  results  as  follows. 
First,  there  is  a  distinct  advantage  to  starting  with  a 
small  penalty  parameter.  Initial  values  of  h  =  (max.  arc 
cost)  X  10"-^  or  10"^  provide  superior  overall  performance  in 
the  test  problems.  Second,  the  penalty  exponent  t  =  1.1 
shows  a  definite  overall  disadvantage  to  values  of  1.5  and 
2.0.  Both  t  =  1.5  and  t  =  2  perform  well  for  an  appropriate 
matching  value  of  h,  but  we  favor  t  =  1.5  because  it 
provides  the  best  overall  final  results,  apparently  reducing 
the  dominance  of  the  most  grossly  violated  arcs  when 
U]^  =  h(  (AjX-b]^)  ■^)  •  ^  IS  computed. 

Figure  4.2  graphically  displays  the  interactions  between 
h  and  t  in  RSD(P)  by  depicting  upper  and  lower  bounds 
relative  to  the  optimal  solution  of  problem  lOH.  The  heavy 
black  lines  suggest  that  the  best  response  is  with  h  =  62.7, 
t  =  1.5,  but  it  appears  that   any   choice  of  0  <  h  <  70   and 
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Figure  4.2   Response  to  Changing  Parameters, 
RSD(P)  on  Problem  lOH 


1.5  <  t  <  2  will  perform  well.   Note  that  the  upper  bound  is 
nearly  optimal  throughout  the  suggested  range. 

The  parametric  responses  reported  here  suggest  a  simple 
method  for  selecting  appropriate  starting  parameters  for  the 
penalty  algorithm.  First,  we  choose  the  h  multiplier  a  = 
2.0  and  set  1.5  <  t  <  2.0.  Then  we  make  an  estimate,  u^ j , 
of  the  optimal  dual  variable  associated  with  the  most 
violated  constraint  found  when  LR(0)  is  solved,  and  select  h 
so  that 
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h  *  max[ (AjX-bij)+]t"l   ~   u^j  . 

Better  guesses  of  u^  evidently  evoke  better  performance  of 
the  algorithm.  In  this  case,  we  simply  use  a  fraction  of 
the  largest  cost  as  our  estimate;  more  sophisticated 
estimates  may  be  made  by  examining  differences  in  costs 
between  least  and  most  expensive  paths  through  the  network, 
or  by  solving  a  single  problem  with  aggregated 
multicommodity  supplies  and  demands.  One  point  is  clear 
from  the  data:  it  is  better  to  underestimate  the  best 
initial  value  for  h  than  to  overestimate,  because  the 
algorithm  is  quickly  self-correcting  for  low  estimates. 
That  is,  if  the  initial  penalty  is  so  small  that  it  does  not 
produce  a  new  extreme  point,  the  algorithm  immediately  (and 
repeatedly)  increases  the  penalty  parameter  until  it  does. 
Furthermore,  when  the  initial  multiplier  estimates  are  good, 
the  subproblems  generate  extreme  points  more  closely  related 
to  an  optimal  solution.  Consequently,  both  lower  bounds  and 
primal  solutions  quickly  benefit.  On  the  other  hand, 
excessively  large  penalties  generate  extreme  points  with 
little  relation  to  the  optimal  solution,  producing  poor 
lower  bounds  and  degrading  the  primal  solutions  obtained 
through  resource  allocations. 

Figure  4.3  shows  the  interactions  of  parameters  for 
RSD(A).  In  this  case,  we  use  a  constant  penalty  exponent  t 
=  2.0,  and  vary  h  and  a.  As  indicated  by  the  response 
surfaces  in  the  figure,  the  best  bounds  are  obtained  for  a 
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combination  of  small  initial  penalty  parameters  and  smaller 
multipliers.  We  fix  the  multiplier  at  1.2  in  subsequent 
tests.  Again,  the  initial  value  of  h  may  be  calculated  from 
a  U]^  estimate  using 

h  X  [max(AjX*^-b]^j )  ■'']   ::   u^j  , 

and  the  algorithm  quickly  corrects  for  underestimates  but 
recovers  slowly  from  overestimates. 

The  computational  experience  presented  by  Hearn  et  al. 
(1984)  indicates  that  performance  improves  as  the  number  of 
retained  extreme  points  is  increased.  In  fact,  if  enough 
points  are  retained,  theoretically  the  heuristic  becomes  a 
finite  algorithm.  Performance  in  our  tests  improves  for  up 
to  about  eight  points,  and  then  levels  off.  As  the  number 
of  retained  points  increases,  the  time  spent  in  the  master 
problem  increases.  In  the  following  section,  we  fix  the 
number  of  retained  points  at  eight  and  obtain  quite 
satisfactory  performance. 

Four  performance  issues  concern  us  with  algorithm  DDC. 

The  first  is  reaching  a  bounded  condition  in  the  dual 
master  problem  (or,  equivalently,  reaching  a  feasible 
solution  to  the  Dantzig-Wolfe  master  problem) .  Since  the 
constraint  set  of  the  dual  master  problem  may  be  viewed  as 
forming  a  piecewise  linear  tangential  approximation  of  the 
Lagrangean   dual   function,  we   obtain   no   feasible   primal 
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Figure  4.3   Response  to  Changing  Parameters, 
RSD(A)  on  Problem  4H 


solution  until  the  set  is  bounded.    In  problem  lOH,  for 
instance,  this  requires  6  iterations. 

One  method  to  obtain  early  boundedness  is  to  perform  a 
resource  allocation  from  the  first  relaxation,  as  in  the  RSD 
algorithms,  and  append  this  information  to  the  DDC  master 
problem  as  a  "pseudocut . "  This  method  does  provide  an  early 
feasible  solution  and  the  first  pseudocut  seems  to  remain 
binding  for  many  iterations.  However,  our  results  show  that 
it  does  not   improve  the  point  at  which  boundedness  is 
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naturally  achieved  in  the  master  problem  using  cuts 
generated  by  the  subproblems,  nor  does  it  affect  the 
subsequent  convergence  trajectory. 

Second,  we  must  make  an  intelligent  choice  of  the 
decomposition  tolerance  e  in  the  cut  thresholds  ub+e.  e 
acts  as  a  relaxation  parameter.  If  e  is  too  small,  we 
restrict  the  size  of  our  improvements  in  the  dual 
multipliers  and  inch  uphill;  if  e  is  too  large,  we  overshoot 
good  multiplier  choices  and  may  oscillate  through  a  sequence 
of  extreme  points  which  generate  poor  cuts  and  consequently 
poor  dual  solutions.  Moderation  is  a  virtue  in  the  choice 
of  e.  For  this  set  of  test  problems,  we  initially  set  e  = 
100,000  since  no  dual  variable  is  expected  to  exceed  62,700 
and  obtain  reasonable  performance  by  reducing  e  by  half 
whenever  e-optimality  is  achieved  (i.e.,  whenever  the  cuts 
cannot  be  satisfied) .  A  final  value  ef  terminates  this 
sequence  of  master  problem  relaxations. 

Third,  the  decomposition  goals  are  a  daunting 
complication  at  first  glance.  However,  a  goal  constraint 
for  each  master  problem  variable  can  be  incorporated  as  a 
generalized  upper  bound,  and  the  resulting  master  problem 
solved  with  very  little  additional  effort. 

Finally,  we  must  consider  the  effects  of  accumulating 
too  many  cuts  in  the  master  problem.  As  the  number  of 
retained  cuts  grows,  more  time  and  space  are  required  to 
solve  the  master,  and  more  time  is  required  to  regenerate 
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solutions  from  peripheral  storage.  However  as  the 
approximation  to  the  dual  response  surface  improves,  many  of 
the  earlier  cuts  become  non-binding,  either  temporarily  or 
permanently.  Figures  4.4  and  4.5  show  cut  histories  for 
problems  4H  and  lOH,  respectively.  In  these  figures,  the 
height  of  the  "skyscraper"  indicates  the  weight  of  the  cuts 
at  each  iteration.  Both  results  show  that  only  a  few  cuts 
reenter  a  subsequent  solution  after  once  becoming  slack,  and 
then  only  at  a  small  dual  weight. 
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Figure  4.4 


Active  Cuts  at  Each  Iteration 
DDC  on  Problem  4H 
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We  take  advantage  of  this  property  by  restricting  the 
number  of  cuts  retained  to  n^Qj^j^  =  20.  We  also  use  an 
exponential  moving  average  of  the  previous  cut  weightings 
(master  problem  dual  variables)  to  determine  which  cuts  to 
drop.  In  the  (unlikely)  event  that  a  taut  cut  must  be 
replaced,  the  (upper  bound)  value  of  a  recoverable  primal 
solution  must  be  relaxed  accordingly. 
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^CUT  NUMBER 


Figure  4.5 


Active  Cuts  at  Each  Iteration 
DDC  on  Problem  lOH 
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The  master  problems  are  best  solved  by  using  a  basis 
crash  from  the  preceding  solution  in  the  sequence  (except  in 
the  rare  case  that  a  taut  cut  has  been  replaced) . 

For  our  problems,  the  master  problems  have  as  many  as 
^hold  ~  2^  (dense)  cut  constraints,  and  1,071  decomposition 
goal  (GUB)  constraints,  one  for  each  of  the  1,071  variables. 
Master  problem  solution  times  average  0.2  seconds  (IBM  3  03  3 
AP)  . 

The  pure  network  subproblems  have  3,300  nodes  and  10,400 
arcs,  and  solve  in  about  1  second  per  commodity.  To  reduce 
these  dominating  computation  times,  a  hot  start  mechanism  is 
used  which  initially  restricts  the  network  to  those 
variables  known  to  have  had  positive  flow  in  any  prior 
solution  for  that  commodity.  This  restriction  tends  to 
reduce  solution  time  as  more  experience  is  gained  over 
iterations.  After  about  10  cuts,  network  subproblems  rarely 
use  any  new  arc  not  used  before. 

The  networks  subproblems  are  much  more  expensive  to 
solve  in  DDC  than  the  LP  master  problems.  This  is  reversed 
from  experience  with  primal  decomposition,  for  which  the 
master  problems  are  commonly  mixed  integer  LPs  (e.g.,  see 
Geoffrion  and  Graves,  1974,  or  Brown,  Graves  and 
Honczarenko,  1983) .  Unfortunately,  further  mechanisms  to 
reduce  network  solution  times  require  additional  storage, 
and  we  have  limited  our  implementation  to  operating  with  a 
default  2  M-byte  VM/CMS  region.   Overlays  are  used  for  each 
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commodity  subproblem  and  the  master  problem.  We  do  not  want 
to  limit  the  number  of  commodities  and  have  therefore  used 
no  data  structure  spanning  commodities. 
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C.   INTERMETHOD  COMPARISONS 

We  now  turn  our  attention  to  comparisons  among  the 
various  algorithms.  We  establish  acceptable  solution  gaps 
of  1%  and  4%  for  the  four  and  ten  product  problems, 
respectively,  for  two  reasons.  First,  for  such  large-scale 
planning  models,  these  levels  of  accuracy  are  adequate. 
Second,  in  order  to  maintain  integer  arithmetic  in  the 
network  problems,  we  round  multiplier  estimates  and  capacity 
allocations,  inducing  the  possibility  that  exact  optima  to 
the  original  problem  have  been  excluded. 

Figures  4.6  and  4.7  show  the  significant  computational 
advantage  of  RSD(A)  over  RDLB.   For  both   test   problems   4H 
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Solution  Trajectory  Comparison  (Problem  4H) 
RSD(A)  vs.  RDLB 
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Figure  4.7   Solution  Trajectory  Comparison  (Problem  lOH) 
RSD(A)  vs.  RDLB 


and  lOH,  the  performance  of  the  RDLB  is  much  worse  than 
RSD(A)  because  the  subgradient  reallocations  are  generally 
ineffective.  In  fact,  although  the  subgradient  realloca- 
tions lead  to  an  optimal  solution  for  problem  4E  (not  shown) 
no  improvements  are  found  for  lOH,  and  improvements  totaling 
.007%  out  of  an  11.8%  gap  are  made  in  4H,  only  by  using  an 
exceedingly  small  step  size. 

The  poor  primal  performance  of  RDLB  is  compounded,  by 
slow  convergence  of  the  Lagrangean  problem.  As  indicated  by 
the  figures,  it  usually  requires  accumulation  of  three  to 


103 


five  new  subgradients  before  there  is  enough  information 
available  to  find  a  good  conjugate  direction  and 
substantially  improve  the  lower  bound.  Since  only  a  few 
costs  are  typically  changed  in  solving  the  second  set  of 
network  subproblems  for  the  quadratic  approximation  to  the 
line  search,  we  use  the  previous  bases  as  a  "hot  restart," 
but  the  time  savings  is  not  adequate  to  make  the  algorithm 
competitive  in  performance  with  the  other  algorithms.  Even 
though  other  lower  bounding  strategies  are  available  for 
testing,  no  further  tests  are  performed  on  this  algorithm 
because  of  its  combined  poor  performance  on  both  lower  and 
upper  bounds. 

For  the  two  test  problems  shown,  the  RDLB  method 
requires  significantly  longer  to  do  less.  The  algorithm 
terminates  with  final  gaps  of  5.83  and  10.7  percent,  both 
unacceptable  by  our  stated  standard.  The  primal  bound  is 
relatively  poor  in  both  problems,  corroborating  the  results 
of  other  researchers  (e.g.,  Allen  (1985))  which  show 
subgradient-based  resource  direction  unable  to  achieve  an 
optimal  solution. 

We  now  compare  RSD(A)  to  RSD(P)  (t  =  1.5).  In  Figure 
4.8,  RSD(A)  takes  about  60  seconds  on  problem  4H  to  reach  a 
solution  within  .15%  of  optimal,  while  RSD(P)  reaches  a  .69% 
solution  in  under  120  seconds.  In  Figure  4.9,  the  augmented 
form  reaches  a  4%  solution  to  problem  lOH  in  about  270 
seconds  and  a  3.56%  solution  after  21   iterations   in   about 
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Figure  4.8   Solution  Trajectory  Comparison 
RSD(A)  vs.  RSD(P) ,  Problem  4H 


440  seconds.  The  penalty  version  takes  nearly  470  seconds 
to  reach  the  4%  level,  closing  at  3.93%.  The  X-system  on  an 
IBM  3081K  solves  this  problem  optimally  in  1393.5  seconds 
using  the  network  hot-start  procedure  (5,3  52  seconds 
without!).  Considering  the  difference  in  computer  speeds, 
RSD(A)  reaches  an  acceptable  solution  in  1/10  the  time  of 
the  X-system. 

In  comparing  RSD(A)  and  RSD(P) ,  we  note  not  so  much  the 
difference  in  performance  times  as  the  movement  of  the  lower 
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Figure  4.9   Solution  Trajectory  Comparisons 
(Problem  lOH)  RSD(A)  vs.  RSD(P) 


bounds.  The  bound  for  RSD(A)  climbs  more  quickly  than 
RSD(P)  apparently  due  to  better  instantaneous  multiplier 
estimates  and  the  fact  that  h  is  increased  in  smaller  steps 
in  RSD(A),  resulting  in  a  better-conditioned  master  problem 
at  each  iteration.  Complete  convergence  of  bounds  is  not 
anticipated  for  either  algorithm,  because  we  are  rounding 
the  multiplier  estimates  to  perform  integer  arithmetic  in 
the  subproblems,  we  are  not  scaling  the  subproblems,  and  we 
are  retaining  only  eight  points.    However,  the  solutions 
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obtained  by  both  algorithms  are  adequate,  especially  since 
the  primal  incumbents  invariably  seem  to  be  very  near 
optimal . 

Finally,  Figures  4.10  and  4.11  compare  the  performance 
of  RSD(A)  to  the  performance  of  DDC.  For  both  problems  4H 
and  lOH,  the  RSD(A)  algorithm  significantly  outperforms  the 
dual  algorithm.  While  RSD  takes  about  60  seconds  to  reach  a 
solution  within  .15%  of  optimal  in  4H,  DDC  required  160 
seconds  to  reach  a  1%  solution  and  about  180  seconds  to 
reach  a  gap  comparable  to  RSD (A) . 


-*^gzdt=-M  www 


■  ■»«"•« 


RSD(A)  »  • — • 
DDC    "  * — X 


100  200 

TIME  (SECONDS) 


300 


Figure   4.10 


Solution  Trajectory  Comparison 
RSD (A)  vs.  DDC,  Problem  4H 
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Figure  4.11   Solution  Trajectory  Comparison 

RSD(A)  vs.  DDC,  Problem  lOH 


In  lOH,  the  comparison  is  similar.  RSD(A)  reaches  a  4% 
solution  in  about  180  seconds  and  3.56%  in  440  seconds.  DDC 
terminates  at  650  seconds  after  50  iterations  with  an  8.5% 
gap.  These  results  are  especially  significant  since  the 
dual  decomposition  algorithm  has  a  reputation  for  rapid 
initial  progress.  It  is  evident  from  both  test  problems 
that  the  RSD(A)  algorithm  performed  better  both  on  the  upper 
and  lower  bounds. 


108 


Table  4.4  summarizes  the  final  results  for  all 
algorithms  and  basic  test  problems  run  on  an  IBM  3033AP. 
Notice  that  RDLB  runs  reasonably  well  on  4E,  but  its  times 
and  solution  quality  are  unacceptable  for  the  hard  problems. 
On  the  other  hand,  RSD(A)  produces  near-optimal  primal 
solutions,  acceptable  bounds,  and  fast  times  for  each  test 
problem.  The  dual  decomposition  produces  good  final 
results,  but  through  a  poorer  trajectory,  and  shows  signs  of 
laboring  on  problem  lOH. 

Finally,  we  construct  a  problem  of  100  products  to  test 
DDC  and  RSD(A) .  It  has  31  initial  capacity  violations  with 
a  maximum  violation  of  238%  of  arc  capacity.  Due  to  the 
increase  in  problem  size,  the  initial  value  of  h  is  reduced 
by  a  factor  of  10  in  RSD(A)  :  no  other  changes  have  been 
made.  Figure  4.12  shows  that  RSD(A)  reaches  an  acceptable 
4%  solution  in  about  1000  seconds  and  concludes  21 
iterations  in  3000  seconds  with  a  1.5%  gap.  DDC  terminates 
at  4090  seconds  and  50  iterations  with  a  12.15%  gap 
remaining.  We  note  that  a  previous  effort  on  this  same 
problem  using  a  resource-directive  algorithm  achieved  an 
11.8%  solution  in  1000  seconds  (Staniec,  1984),  but  made  no 
further  improvements  in  an  hour  of  computation.  The  RSD(A) 
algorithm  achieved  an  acceptable  solution  in  less  than  17 
minutes  for  a  problem  of  some  330,000  constraints  and 
1,040,000  variables. 
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Figure  4.12   Solution  Trajectory  RSD(A) 

on  100  Commodities 
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V.   CONCLUSION 

We  have  presented  three  algorithms  for  solving  large- 
scale  linear  programs  that  fall  within  the  general  framework 
of  decomposition,  with  specific  application  to  the  MCTP. 

The  first  algorithm,  a  resource-directive  decomposition, 
has  contributed  a  new,  simplified  method  of  projecting 
subgradient  reallocations  in  the  primal  problems,  and  an 
improved  method  for  terminating  the  algorithm  via  conjugate 
subgradient  directions  and  approximate  line  searches  in  the 
Lagrangean  lower  bounding  routine.  However,  the  method  is 
not  computationally  effective  due  to  inability  to  find 
improving  subgradient  reallocations  and  due  to  computational 
burden  in  the  line  searches. 

The  second  algorithm  is  a  dual  decomposition  using  a 
master  problem  which  is  the  relaxed  dual  of  the  standard 
Dantzig-Wolfe  master  problem.  Tests  show  the  method  to  be 
computationally  effective,  but  with  slow  convergence  for 
very  large  problems. 

The  last  algorithm  (with  two  variants)  is  a  new,  non- 
linear price-directive  decomposition  using  a  penalty 
transformation  of  the  original  problem.  Computational  tests 
show  the  method  to  be  approximately  ten  times  faster  than  a 
direct  simplex-based  algorithm,  and  2-3  times  faster  than 
the  dual  decomposition  tested. 
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The  initial  success  of  the  RSD  algorithm  prompts  several 
areas  for  further  investigation.  First,  the  test  suite  is 
limited.  We  intend  to  test  the  algorithm  on  other  large- 
scale  problems 

Second,  Hearn  et  al.  (1984)  suggest  that  quadratic 
approximations  to  the  master  problem  may  speed  its  solution 
significantly  without  degrading  convergence  rate.  Bertsekas 
(1982a)  has  described  a  projected  Newton  method  for  solving 
non-linear  objectives  in  the  presence  of  simple  constraints. 
We  propose  development  of  a  form  of  the  Bertsekas  algorithm 
which  is  capable  of  handling  the  penalty  transformation  of 
MCTP. 

Third,  it  is  reasonable  to  assume  other  non-linear 
objectives  may  have  attractive  features.  For  instance,  the 
logarithmic  barrier  function  has  recently  been  the  subject 
of  extensive  research  for  solving  linear  programs  (see,  for 
instance,  Gill  et  al.  (1986)  or  Karmarkar  (1984)).  We 
propose  investigating  interior  point  forms  of  the  algorithm, 
with  the  possibility  of  developing  a  hybrid 
interior/exterior  point  algorithm,  taking  advantage  of  both 
penalty  and  barrier  theory  to  solve  linear  programs. 

Fourth,  we  propose  a  hybrid  algorithm  using  the 
augmented  Lagrangean  form  of  RSD  combined  with  dual 
decomposition  to  take  advantage  of  the  best  properties  of 
both  algorithms.  Via  RSD,  we  quickly  generate  good 
estimates  of  the  optimal  dual  multipliers,  and  therefore 
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favorable  extreme  points.  However,  convergence  slows  due  to 
the  restricted  number  of  extreme  points  retained. 
Increasing  the  extreme  point  count  equates  to  increasing  the 
time  spent  solving  the  master  problem.  However,  these 
favorable  extreme  points  may  all  be  used  to  produce  cuts  for 
the  dual  decomposition  master  problem,  which  can  be  solved 
more  quickly  and  precisely  via  linear  programming.  We 
anticipate  the  result  to  be  a  hybrid  algorithm  which  has 
both  favorable  initial  and  terminal  convergence  properties. 

Finally,  we  consider  an  extension  of  the  algorithm  from 
the  shipment  planning  to  a  shipment  scheduling  framework,  in 
which,  for  instance,  we  must  make  binary  decisions  on  sea 
shipment  arcs  to  account  for  shiploads  of  material.  This 
will  entail  surrounding  the  RSD  algorithm  with  a  shell 
capable  of  interpreting  information  to  make  binary 
selections.  One  possible  way  to  accomplish  this  is  to 
develop  a  specialized  version  of  the  cross-decomposition 
algorithm  of  Van  Roy  (1986)  which  incorporates  RSD. 

In  conclusion,  the  result  of  this  research  has  been  a 
new,  computationally  attractive  algorithm  for  large  scale 
linear  programs  using  non-linear  methods.  Also,  through  the 
computational  results  produced,  it  has  documented  some  of 
the  shortcomings  of  the  subgradient  approach  in  resource 
direction.  It  is  evident  that  more  information  (e.g.,  a 
formal  Benders  master  problem)  is  required  to  solve 
difficult  problems  by  resource  direction. 
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APPENDIX 
CONVEXITY  OF  THE  QUADRATIC  PENALTY  FUNCTION 

The  following  lemma  demonstrates  that  the  quadratic 
penalty  form  of  the  MCTP  is  everywhere  convex,  which  makes 
it  appropriate  for  the  application  of  RSD.  Convexity  of 
other  penalty  forms  may  be  proved  in  a  similar  manner. 

Lemma:   The    function,    q(h,x)    is    convex    for   all 
X    F  =  (x|Nx  =  b2,0  <  X  <  b^}. 

Proof:    q(h,x)  =  ex  +  P(h,x),  where 

P(h,x)  =  ^h[  (Ax-bi)"*"]  •  (Ax-bi)"*".  Since  ex  is 
trivially  convex  and  the  sum  of  convex  functions  is 
convex,  we  need  only  show  P(h,x)  to  be  convex;  that 
is,  that 

P(h,x)  >  P(h,x)  +  VxP(h,x) (x-x) ,   for  all  x  e  F 

where  VPx(h,x)  =  h[  (Ax-bj)  "•"]  '  A 

Then  we  define  Jy  =  (j|AjX-b]^j  >  0}  and  revert  to 

summation  form.   P(h,x)  is  convex  if 

j;  |h[(AjX-bij)+]2  >  I       lh(AjT-bij)2  +1         h(AjX-bij)Aj(x-x). 
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Rewriting  the  left  side  as 
h[(AjX-biJ)+]2  =  I         h[(AjX-bij)+]2  +  I         h[AjX-bij)+]2 

and  noting  that  I        h[  (AjX-b^j )  ■*"]  2  >  o  for  all  x 

j/Jv 

we  may  prove  convexity  if 

I    2h[(AjX-bij)+]2    >    I         (lh(AjX-bij)2    +   h (AjX-bij ) Aj (x-x) } 
j  t J^  j  ejy 

or,    considering   additivity   of   convex   functions,    if 
2h[(AjX-bij)+]2    >  2h[(AjX-bij)+]2    +   h (AjX-bij ) +Aj (x-x) 

for   all    X    -   F,    and   j     ^  Jy. 

If  (AjX-bij)"*"  =  0,  the  right  side  of  the 
inequality  is  0,  and  h[  (A-jX-b^^-; )  """J  2  >  o  trivially. 
Now,  for  (AjX-bj^j)"^  >  0,  letting  Uj  =  AjX-b]^j, 
dropping  h  we  write 

1-9-  -1-- 

2<AjX-bij)'^  +  (AjX-bij)Aj  (x-x)  =  2Uj  (AjX-b^j )  +  Uj(AjX-Ajx) 

=   Uj  [AjX-(bij)+^AjX-bi)-AjX+(bij)  ] 

1    - 
=   Uj[AjX-bij  -2(AjX-bij)] 

=   Uj  (AjX-bij)  -  2Uj^ 

.    1  +9 

Convexity  is  proved  if  2   [  (AjX-b]^ j )  ^] -^   >  Uj(AjX- 


bij)  -  2^j2 


Two  cases  occur: 
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For  all  X  I  (AjX-b^j)"*"  =  0,  then  AjX-b]^j  <  0  and  we 
have   |((AjX-bij)+)2  =  0  >  Uj(AjX-bij)  -2^j^  = 

((AjX-bij)+]2  +  (AjX-bij)+Aj(x-ir)  . 
For  all  X   |  (AjX-bij)"*"  >  0,  we  let  Uj  =  (AjX-b^j) 
and  note  (Uj-Uj)''  >  0. 

But  (uj-Uj)2  =  Uj2  -  2ujUj  +  Uj2  >  0,  and  Uj^  >- 
Uj'^^  +2uj-Uj,  so  multiplying  by  h  and  restoring 
values,  we  have 

|h[(AjX-bij)+]2  >|h[(AjX-bij)+]2  +  h(AjX-bij)+Aj(x-x) 

Since  Pj(h,x)  is  convex  for  all  j  and  the  summation 
of  convex  functions  is  convex,  q(h,x)  is  convex. 
QED 
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